/ 

TESTING  PROGRAMS 

FOR 

SECONDARY  SCHOOLS 


By 


J.  MURRAY  LEE 


Submitted  in  partial  fulliillment  of  the  require- 
ments for  the  degree  of  Doctor  of  Philosophy 
in  the  Faculty  of  Philosophy,  Columbia  Uni- 
versity. 


Burbank,  California 
1934 


Copyright  1934,  by  J.  Murray  Lee 


Ail  rights  reserved.  Permission  to  reproduce 
any  part  of  this  material  in  any  form  must  be 
obtained  from  the  author,  due  to  probable 
future  commercial  publication. 


Printed  in  the  United  States  of  America 


•»  3 


3 3 3 3 ) 


J -0 

• I V 

V O)  <9 


* •' 

• • 

- 

«>  ■»/  • 


' v> 


- e ')  ■*"«>*  ..  *i  «J 

m *>  “»  , - 

. M *,  ' n <1*1  N" 

•-'■>?  *'  "V  * 

. ■ • i 


it 

' > 
«■  « 


r) 


* > •-) 

-) 


I ' 


> 


% 


CONTENTS 


CHAPTER  PAGE 

Part  I Importance  and  Development  of  Testing 

I.  The  Importance  of  Testing 1 


II.  Development  of  Testing  in  the  High  School  . . 11 

1.  Early  beginnings  ...........  11 

2.  Mental  testing 11 

3.  Achievement  testing 13 

4.  Contributing  agencies 14 

5.  New-type  or  objective  tests 17 

6.  Increase  in  number  of  high  school  tests  ......  20 

7.  Improvement  in  quality  of  high  schools  tests 23 

8.  Summary  .........  25 


Part  II  Administration  of  Testing 

III.  Planning  the  Program  of  Standardized  Tests  . 26 


1.  What  types  of  standardized  tests  are  available  for  use?  ...  26 

2.  What  types  of  testing  programs  are  the  schools  conducting  . . 32 

3.  Who  should  be  responsible  for  the  selection  of  tests?  ...  36 

4.  Summary  39 


IV.  Administration  of  Intelligence  Tests  ...  42 


1.  What  per  cent  of  pupil's  should  be  given  intelligence  tests?  . . 42 

2.  How  often  should  intelligence  tests  be  given?  ....  44 

3.  In  what  grades  should  intei'lgence  tests  be  given?  ....  46 

4.  At  what  time  during  the  year  should  intelligence  tests  be  given  . 48 

5.  Who  should  be  given  individual  intelligence  tests?  ....  48 

6.  What  intelligence  tests  should  be  used? 51 

7.  How  comparable  are  results  of  different  tests 56 

8.  Who  should  give  intelligence  tests?  .......  59 

9.  Who  should  score  the  tests?  .........  62 

10.  In  what  form  should  test  results  be  made  available?  ...  64 

11.  How  should  I.  Q.s  be  figured? 67 

12.  How  should  intelligence  test  results  be  recorded?  ....  72 

13.  To  whom  should  intelligence  test  results  be  made  available  . 74 

14.  What  uses  are  made  of  intelligence  tests? 77 


V.  Administration  of  Standardized  Achievement  Tests  80 

1.  What  are  the  relative  merits  of  the  standardized  achievement  tests 

as  compared  with  the  teacher-made  tests? 80 

2.  What  is  the  relative  emphasis  placed  on  standardized  achievement 

testing  by  the  various  subject  matter  departments?  ...  85 

3.  When  should  standardized  achievement  tests  be  given?  . . 89 

4.  Who  should  give  standardized  achievement  tests?  ...  91 

5.  Who  should  score  the  standardized  achievement  tests  . . 92 

6.  To  whom  should  standardized  achievement  tests  results  be  made 

available  94 

7.  How  should  standardized  achievement  test  results  be  recorded  . 96 

8.  In  what  form  should  the  test  results  be  made  available  . . 98 

9.  What  uses  are  made  of  standardized  achievement  tests?  . . 99 

VI.  Administration  of  Teacher-Made  Tests  . . . 102 

1.  What  constitutes  a good  test? 102 

2.  What  are  the  relative  merits  of  the  objective  and  essay  tests?  . 103 

3.  What  are  the  possibilities  of  cooperative  testing  in  the  school?  . 105 

4.  What  steps  should  be  followed  in  constructing  teacher-made  tests?  106 

5.  What  are  likely  to  be  the  faults  of  teacher-made  objective  tests?  . 110 

6.  What  provisions  should  be  made  for  duplicating  tests?  . . 112 

7.  What  are  the  uses  which  are  made  of  objective  and  essay  tests?  . 113 

VII.  Administration  of  Testing — A Brief  Summary  . 117 


ACKNOWLEDGMENTS 


A study  of  this  type  is  made  possible  only  through  the  cooperation  of  a large 
number  of  school  people.  I am  sincerely  grateful  to  the  principal's,  teachers, 
and  research  directors  who  so  generously  used  their  time  to  supply  the  informa- 
tion requested. 

The  suggestions  and  counsel  of  Dr.  Edwin  H.  Reeder  and  Dr.  John  K. 
Norton,  who  acted  as  advisers  on  my  committee,  were  of  great  assistance.  I 
wish  to  thank  them  for  their  help  and  consideration. 

My  greatest  debt  of  gratitude  in  the  conduct  of  the  study  is  to  Dr.  Percival 
M.  Symonds  and  Dr.  Grayson  N.  Kefauver,  who  separately  sponsored  the 
problem.  I am  indebted  to  Dr.  Symonds  for  his  encouragement  and  guidance 
at  the  beginning  of  the  problem,  and  for  his  suggestions  as  to  the  scope  and 
need  for  the  investigation;  and  to  Dr.  Kefauver  for  his  willingness  to  counsel 
after  the  problem  was  under  way,  and  for  his  most  excellent  help  and  advice 
as  to  the  method  of  assembling  and  reporting  the  data. 

Anyone  having  worked  in  the  Secondary  Education  department  owes  much 
to  the  influence  of  Dr.  Thomas  H.  Briggs,  but  I feel  an  especial  debt  for  his 
encouragement  and  guidance  in  the  continuation  of  my  graduate  study. 

J.  M.  L. 


TABLES 


Title 

Table  Page 

I.  An  Analysis  of  the  Copyright  Dates  of  Tests  Suitable  for  Use  in 

Grades  9-12  Which  Were  Commercially  Available  in  1932  . . 18 

II.  An  Analysis  of  the  Development  of  High  School'  Tests  in  Terms 

of  the  Per  Cent  of  Tests  in  Each  Subject  Distributed  According 

to  Date  of  Publication  20 

III.  A Comparison  of  the  Development  of  Standardized  and  Non- 

Standardized  Tests 22 

IV.  The  Per  Cent  of  Junior,  Senior,  and  Six  Year  High  Schools  Using 

Various  Types  of  Standardized  Tests 27 

V.  The  Per  Cent  of  Schools  Following  Certain  Testing  Practices,  Ana- 
lyzed According  to  Type  of  School  and  Size  of  Community  . . 33 

VI.  The  Per  Cent  of  Schools  Giving  Various  Types  of  Tests  . . .35 

VIII.  The  Percentage  of  Pupils  upon  Which  Group  and  Individual  Intelli- 
gence Tests  Results  Are  Available  .......  43 

IX.  Correlations  of  a Group  and  Individual'  Intelligence  Test  with 

Achievement  Tests  at  Different  Timed  Intervals  . . . .45 

X.  Per  Cent  of  Schools  Using  Various  Plans  for  Testing  . . .47 

XI.  Type  of  Cases  Given  Individual  Intelligence  Tests  . . . .50 

XII.  Ranking  of  Group  Intelligence  Tests  According  to  Frequency  of 

Mention  and  Number  of  Times  Given 54 

XIII.  I.  Q.s  Equivalent  to  Terman  I.  Q.s  of  60,  100,  and  140  on  Several 

Group  Intelligence  Tests  for  Three  Levels  of  Ability  . . .58 

XIV.  Differences  Between  I.  Q.s  for  Some  Combinations  of  Group  Intelli- 
gence Tests  and  the  Criterion  I.  Q.s 59 

XV.  Median  I.  Q.s  of  the  Four  Years  of  High  School  Calculated  by  Using 

Age  14  and  Age  16  as  Upper  Limits 68 

X^U.  A Method  of  Figuring  the  Number  of  Months  Since  Last  Birthday  . 69 
XVII.  A Section  of  the  Inglis  Intelligence  Quotient  Values  Illustrating  a 

Rapid  Method  of  Figuring  I.  Q.s 71 

XVIII.  The  Number  of  Schools  Giving  Tests  in  the  Various  Fields  and  the 

Number  of  Tests  Reported  Given 86 

XIX.  Median  Number  per  Teacher  of  Standardized  Achievement  Tests  and 
Teacher-Made  Tests  Given  During  One  Semester  Ending  February 
1932  83 

Figure 

I.  A Reproduction  of  Page  6 from  the  Thorndike-McCall  Reading 
Scale.  Form  I.  .........  23 


Digitized  by  the  Internet  Archive 
in  2017  with  funding  from 

This  project  is  made  possible  by  a grant  from  the  Institute  of  Museum  and  Library  Services  as  administered  by  the  Pennsylvania  Department  of  Education  through  the  Office  of  Commonwealth  Libraries 


https://archive.org/details/testingprogramsfOOIeej 


TESTING  PROGRAMS  FOR  SECONDARY  SCHOOLS 


PART  I 

IMPORTANCE  AND  DEVELOPMENT  OF  TESTING 

CHAPTER  I 

THE  IMPORTANCE  OF  TESTING 

Testing  in  some  form  has  always  been  an  integral  part  of 
our  educational  program.  Demonstrations  of  skill,  answering 
oral  questions,  writing  answers,  or  underlining  responses  are 
all  methods  of  testing.  The  favorite  plan  for  many  years  of 
evaluating  the  students’  knowledge  was  on  the  basis  of  oral 
replies  to  verbal  questions,  but  the  defects  of  that  method 
were  pointed  out  by  Horace  Mann  in  1845  when  he  argued 
for  the  use  of  written  examinations.  He  said,  “We  venture  to 
predict,  that  the  mode  of  examination, by  printed  questions 
and  written  answers  will  constitute  a new  era  in  the  history 
of  our  schools”  (1).  His  prophecy  was  not  fulfilled  until  over 
half  a century  later  when  the  standardized  intelligence  and 
achievement  tests  were  introduced.  During  the  last  sixteen 
years  intelligence,  standardized  achievement,  and  new  type 
objective  testing  have  become  increasingly  popular  in  our  sec- 
ondary schools. 

There  were  over  six  million  standardized  intelligence  or 
achievement  tests  given  in  the  secondary  schools  during 
1931(2).  A recent  study (3)  has  shown  that  the  number  of 
articles  appearing  in  thirteen  leading  educational  periodicals, 
dealing  with  measurement  increased  from  4 for  the  decade  be- 
ginning in  1890  to  844  for  the  ten  years  between  1920  and 
1929. 

1.  Mann,  Horace  “Comments  on  the  Report  of  the  Examining  Committee  of 
the  Boston  Grammar  and  Writing  Schools”  The  Common  School  Journal,  7:330, 
November  1,  1845.  Italia  appear  in  the  original. 

2.  This  figure  is  based  on  the  replies  to  the  check)  list  which  the  writer  sent 
out  and  from  the  information  available  from  publishers. 

3.  Franke,  Paul  R.  and  Davis,  Robert  A.  “Changing  Tendencies  in  Educa- 
tional' Research”  Journal  of  Educational  Research.  '23:133-145,  February,  1931. 


2 


TESTING  PROGRAMS 


These  few  facts  are  among  those  indicating  the  important 
position  that  measurement  has  come  to  assume  in  our  schools. 
When  any  item  in  our  schools  looms  as  large  as  this,  one 
should  begin  to  ask  questions  concerning  its  value.  The  most 
important  questions  are:  What  is  the  function  of  measure- 
ment, and  what  places  should  it  have  in  the  educational  pro- 
cess? Measurement  is  only  a tool.  The  recognition  of  this 
concept  enables  one  better  to  understand  and  evaluate  much 
that  is  claimed  for  it.  As  a tool,  it  can  furnish  answers  to  ques- 
tions of:  How  much  of  the  subject  does  a pupil  or  a class 
know?  What  parts  of  the  subject  are  understood?  Which  pu- 
pils are  in  need  of  special  help?  These  questions  are  very  simi- 
lar to  the  ones  the  carpenter  can  answer  with  his  yard  stick, 
such  as:  How  long  are  the  boards?  Which  boards  need  to  be 
repaired. 

Another  way  of  thinking  of  measurement  is  as  a means  of 
diagnosis.  It  cannot  provide  a cure.  It  occupies  the  same  place 
in  the  educational  process  as  is  occupied  by  the  doctor’s 
instruments  of  diagnosis  in  the  health  process.  Having  this 
idea  of  the  function  of  testing,  one  will  no  more  expect  the 
mere  giving  of  tests  to  result  in  any  change  in  the  pupil  than 
one  would  expect  the  patient  to  improve  after  a doctor’s  diag- 
nosis. It  is  the  treatment  which  takes  place  on  the  basis  of  the 
diagnosis  that  will  determine  whether  the  patient  will  im- 
prove, and  the  same  conclusion  may  be  drawn  in  the  field  of 
measurement. 

Measurement  and  the  Evaluation  of  Instruction.  Tests 

make  an  important  tool  with  which  the  teacher  may  find 
wherein  her  teaching  has  been  effective,  wherein  are  weak- 
nesses of  individual  students,  and  wherein  the  achievements 
of  her  classes  equal  or  fall  below  those  of  other  classes. 

Thorndike  stressed  the  need  for  teachers  to  measure  the 
results  of  their  instruction: 


It  will  be  said  that  the  energy  of  teachers  should  be  devoted  to  making 
achievements  great  rather  than  to  measuring  how  great  they  are.  It  is  true  that 
for  many  teachers  and  many  students  it  is  wise  to  teach  and  learn  as  well  as 
may  be,  leaving  the  results  to  faith  and  hope  or  even  charity.  Moreover,  there 
are  gifted  personalities  to  whom  scientific  and  business-lilde  procedures  are 
alien  and  even  odious,  and  who  should  not  be  required  to  measure  what  they 
are  doing  or  even  in  the  ordinary  sense  of  the  word,  to  know  what  they 
are  doing.  Their  genius  is  better  than  efficiency.  There  are,  however,  not 


IMPORTANCE  OF  TESTING 


enough  of  these  to  be  more  than  a negligible  factor  in,  say,  the  teaching  of 
freshman  English  or  first-year  anatomy  or  the  Law  of  Contracts.’’ 

Measurement  and  Guidance.  The  value  of  tests  in  guid- 
ance lies  in  the  effectiveness  with  which  they  will  predict 
the  future  performance  of  the  pupil.  This  is  a different  con- 
cept from  that  of  measuring  what  a pupil  has  achieved.  Thus, 
for  purposes  of  guidance,  one  is  no  longer  interested  in  how 
closely  a test  measures  the  objectives  of  the  course.  The  validi- 
ty of  the  guidance  test  is  determined  by  its  power  to  predict. 

In  a most  comprehensive  study  of  the  guidance  program 
Koos  and  Kefauver  state  that: 

“The  modem  program  of  guidance  endeavors  to  obtain  accurate  characteriza- 
tions of  individuals  . . . Perfect  measures  are  not  available  and  factors  other  than 
the  possession  of  ability  condition  student’s  probable  success.  None  the  less, 
the  measurements  available  supplement  in  significant  ways  the  other  types  of 
evidence”  (5). 

The  effectiveness  of  guidance  grows  as  our  ability  to  suc- 
cessfully predict  increases.  Considerable  evidence  is  develop- 
ing regarding  the  value  of  the  different  types  of  tests  in  pre- 
dicting success  in  various  fields.  Some  of  the  recently  pub- 
lished aptitude  or  prognosis  tests  for  foreign  languages  and 
mathematics  have  sufficiently  high  correlations  with  achieve- 
ment to  be  of  great  assistance  in  predicting  success  or  failure 
in  their  respective  subjects. 

At  this  point  the  caution  needs  to  be  injected  that  all  pre- 
dictions of  success  are  made  on  the  basis  of  the  status  quo.  An 
algebra  prognosis  test  will  predict  the  probable  success  of 
pupils  in  the  algebra  course  as  it  is  taught  today  but  if  the 
course  was  radically  changed,  new  instruments  for  prediction 
might  have  to  be  developed.  Thus,  as  our  educational  process 
changes,  our  instruments  and  tools  which  we  use  in  that  pro- 
cess should  probably  change. 

Measurement  and  Individual  Differences.  Measurement 
can  provide  the  teacher  with  an  immediate  knowledge,  at  the 
beginning  of  the  term,  concerning  some  of  the  differences  be- 
tween individuals.  These  differences  are  indications  of  varia- 
tions in  pupils’  abilities  which  will  persist  throughout  the  term. 

4.  Thorndike,  Edward  L.  “Intelligence  Tests  and  Their  Uses”  Twenty-first 
Yearbook  of  the  National  Society  for  the  Study  of  Education,  Bloomington, 
Illinois:  Public  School  Publishing  Company,  1923,  p.  8-9. 

5.  Koos,  L.  V.  and  Kefauver,  G.  N.  Guidance  in  Secondary  Schools,  New 
York:  The  MacMillan  Company,  1932,  pp.  280-281. 


4 


TESTING  PROGRAMS 


By  tests  now  available  the  teachers  in  most  high  school  sub- 
jects can  distinguish  between  the  pupils  with  the  greatest 
ability  and  those  with  the  least.  All  that  tests  can  do  is  to 
point  out  the  individual  differences  which  exist.  Once  the 
teacher  knows  the  strengths  and  the  weakness  of  her  pupils,  it 
is  her  job  to  adjust  the  instruction  to  their  needs.  Ability 
grouping  on  the  basis  of  test  results  does  not  provide  for  indi- 
vidual differences,  but  differentiated  teaching  in  the  various 
groups  may  provide  for  it. 

The  concept  of  teaching  has  changed  since  measurement 
has  been  responsible  for  bringing  individual  differences  to  the 
fore  of  educational  thought.  The  teacher  is  no  longer  teaching 
geometry  to  a class  of  pupils  who,  according  to  the  old  con- 
cept, are  all  equally  capable.  Instead,  the  tendency  is  toward 
each  pupil  being  taught  the  kind  of  geometry  he  is  able  to 
learn.  If  the  pupil  has  great  capabilities  in  this  line,  his  ability 
now  can  be  recognized  by  giving  an  aptitude  test  at  the 
beginning  of  the  year.  Work  may  then  be  given  him  which  is 
stimulating  and  challenging. 

Measurement  and  Supervision.  A knowledge  of  definite  re- 
sults which  have  been  attained  in  the  various  subjects  is  of 
utmost  importance  if  supervision  is  to  be  effective.  Tests  have 
been  able  to  supply  some  of  the  facts  and  have  been  used  to  a 
rather  large  extent  by  supervisors,  especially  in  the  element- 
ary grades.  One  of  the  main  faults  found  in  the  use  of  test 
results  by  school  supervisors  in  the  stress  placed  on  the  class 
average.  When  the  supervisor  judges  a teacher  on  the  basis 
of  a test  average  in  English,  mathematics,  or  foreign  language, 
the  teacher  is  going  to  accept  that  basis  as  the  goal  for  her 
teaching.  The  supervisor  must  make  it  apparent  to  the  teach- 
er that  the  test  is  merely  a tool  for  measuring  a relatively 
small  area  in  the  teaching  process.  It  is  also  the  duty  of  the 
supervisor  to  show  the  teacher  how  to  make  use  of  the  tool 
in  studying  all  the  pupils  in  her  class,  not  alone  the  average. 
A suggestion  has  been  made  that  instead  of  finding  the  class 
average  the  pupils  in  the  upper  and  lower  fourths  of  the  group 
be  studied.  The  pupils  in  the  upper  group  should  be  given 
more  advanced  work  which  will  prove  challenging  to  them 
and  the  ones  in  the  lower  group  remedial  work  to  correct 
their  difficulties. 


IMPORTANCE  OF  TESTING 


5 


To  a certain  extent,  research  departments  have  perpetuated 
the  emphasis  upon  the  average  in  their  methods  of  reporting 
results.  Consider  the  possibilities  for  the  constructive  use  of 
tests  which  may  be  developed  because  of  difference  in  empha- 
sis in  Cases  A and  B.  These  cases  are  adapted  from  reports 
of  testing  done  by  two  research  departments.  Case  A is  typi- 
cal of  a large  number  of  such  bulletins  published  each  year 
and  illustrates  why  teachers  may  come  to  dislike  testing  pro- 
grams. 

Case  A. 

“The  report  of  the  seventh  grade  language  test  given  in  January  is  as 
follows: 

The  city  average — 7.9 

The  average  for  the  various  seventh  grades  in  each  school  is 


given  below: 

Lincoln  School 8.2 

Miller  School 8.0 

McKinley  School. 7.8 

Washington  School 7.6 


It  would  appear  possible  for  the  McKinley  and  Washington  school  to  do 
much  better  work.” 

Case  B. 

“Below  are  concrete  cases  analyzed  and  discussed  to  show  you  how  to 
interpret  the  results  of  the  language  test  given  recently. 


Mary,  a 7B  pupil,  has  percentile  scores  as 

follows: 

Test  1 

30 

Test  2 

94 

Test  3 

48 

Test  4 

87 

Test  5 

12 

Test  6 

76 

Her  grade  placement  on  the  total  is  7.7  or  three  months  above  the  aver- 
age of  pupils  in  the  same  grade  in  the  cicty. 

She  is  especially  weak  in  Tests  1 and  5.  The  remedial  exercises  in  punctua- 
tion should  be  of  help  in  improving  the  score  on  Test  5.  She  has  a most  excell- 
ent vocabulary  as  shown  by  the  high  score  in  Test  2.  With  comparatively  little 
help  she  should  be  able  to  overcome  her  weaknesses.” 

These  two  examples  show  rather  striking  extremes  in 
stressing  the  use  of  tests.  Case  A places  an  undesirable  em- 
phasis upon  differences  between  classes  in  various  schools. 
These  differences  might  be  due  to  the  ability  of  the  classes, 
which  fact  is  not  given  recognition,  but  in  any  case  it  would 
be  relatively  poor  practice.  This  method  of  reporting  stresses 
teacher-results,  and  the  teacher  is  only  too  conscious  of  how 
well  she  did  in  relation  to  the  other  teachers.  In  case  B,  the 

promote  satisfactory  pupil  and  teacher  growth  in  a changing  social  order?  (16) 


6 


TESTING  PROGRAMS 


emphasis  is  all  placed  on  the  diagnosis  of  the  pupils.  No  com- 
parisons between  classes  are  presented.  Instead,  the  attention 
of  the  teacher  is  focussed  on  the  interpretation  of  the  test  re- 
sults of  each  pupil  in  her  class.  It  is  not  very  difficult  to  im- 
agine the  difference  in  the  attitudes  of  teachers  towards  the 
testing  program  in  both  cities. 

The  major  questions  which  the  supervisor  should  consider 
in  relation  to  the  measuring  program  are,  according  to  Al- 
berty  and  Thayer,  the  following: 

“1.  What  are  the  valid  learning  products,  in  terms  of  changes  in  behavior, 
which  the  school  should  seek  to  achieve? 

“2.  What  devices  are  most  suitable  for  the  measurement  of  these  products? 

“3.  How  should  these  measures  of  pupil  achievement  be  used  so  as  to 

promote  satisfactory  pupil  and  teacher  growth  in  a changing  social  order?  (6) 

These  three  questions,  if  answered  in  the  thinking  and  pract- 
ice of  supervisors,  would  lead  to  much  better  use  of  tests  as  a 
tool  to  improve  learning. 

Measurement  and  Personality  Diagnoses.  The  attention  of 
the  school  has  focussed  upon  the  measurement  of  achievement 
and  ability.  The  result  of  such  attention  has  been  the  develop- 
ment and  wide  use  of  standardized  achievement  and  intelli- 
gence tests.  Such  a program  is  deficient  in  that  no  informa- 
tion is  available  on  the  attitudes  and  character  traits  which 
are  developing.  These  educational  outcomes  have  been 
neglected  in  the  measurement  program. 

A small  beginning  has  been  made  in  the  measuring  of  the 
“concomitants”  of  school  work,  but  much  needs  to  be  done 
before  the  school  can  use  the  instruments  with  assurance. 

There  are  four  areas  where  diagnosis  of  conduct  is  needed. 
These  are,  according  to  Symonds(7),  diagnosis  of  crime,  in 
sanity,  vocational  incompetence,  and  citizenship.  All  of  these 
areas  are  important  in  the  school  program.  In  the  field  of  crime 
the  school  is  interested  in  tests  of  predelinquency.  It  would  be 
most  desirable  to  recognize  delinquent  behavior  in  the  incipi- 
ent stage  before  it  becomes  a serious  problem.  Insanity  has 
its  beginnings  in  neurotic  trends  and  emotional  maladjust- 
ments of  children.  If  it  were  possible  to  discover  these  prob- 
lem cases,  a great  saving  could  be  made  by  correcting  them 

6.  Alberty,  H.  B.  and  Thayer,  V.  T.  Supervision  in  the  Secondary  School. 
Boston:  D.  C.  Heath  and  Company,  1831,  p.  332. 

7.  Symonds,  Percival  M.  Diagnosing  Personality  and  Conduct,  New  York: 
The  Century  Company,  1931,  pp.  8-12. 


IMPORTANCE  OF  TESTING 


7 


before  they  became  serious.  In  the  vocational  field,  measures 
of  interest  and  certain  character  traits  are  needed.  As  regards 
citizenship,  measures  of  attitudes,  leadership,  and  charactcer 
are  desired. 

To  a limited  extent,  there  are  available  tests  which  meas- 
ure certain  attitudes,  interests,  adjustments,  and  character 
traits  in  pupils.  A summary  of  the  work  in  this  field  is  found 
in  Symonds’  Diagnosing  Personality  and  Conduct.(8)  Such 
measures  should  be  used  with  care  and  only  by  one  who 
understands  their  present  limitations. 

Measurement  and  Marking.  Marks  are  used  in  most  secon- 
dary schools  throughout  the  country.  The  issue  discussed  is 
not  whether  marks  should  be  given;  but  rather  if  marks  are 
given,  how  they  may  be  given  meaning.  It  is  becoming  in- 
creasingly accepted  that  “final  marks  for  a semester. ...should 
represent  the  best  possible  estimate  of  achievement  and  status 
in  the  subject” (9).  Douglas  also  says  “If  it  is  desired  to  reward 
industry,  citizenship,  or  achievement  in  proportion  to  ability, 
supplementary  marks  should  be  used.” (10) 

The  “best  possible  estimate”  of  achievement  can  be  ob- 
tained by  using  tests,  either  teacher-made  or  standardized.  If 
tests  are  not  use,  the  estimate  of  achievement  is  likely  to  con- 
tain many  factors  other  than  the  one  which  mark  is  supposed 
to  be  based.  The  tests  should  be  as  objective(ll)  as  possible. 
It  has  been  demonstrated  in  a number  of  experiments  that 
different  teachers  do  not  mark  the  same  essay  test  alike. 
Standardized  achievement  and  teacher-made  objective  tests 
are  excellent  devices  for  the  improvement  of  the  marking 
system. 

Measurement  and  Research.  There  are  many  questions  in 
secondary  school  administration,  supervision,  and  teaching, 
which  cannot  be  answered  satisfactorily  without  some  re- 
search. Some  form  of  measurement  or  comparison  is  assumed 
in  research  studies.  Schools  are  increasingly  answering  or 
feeling  the  need  to  answer  their  questions  by  research  and  not 
by  mere  opinion.  As  one  principal  says,  “I  could  go  far  in 

8.  Ibid.  ' 

9.  Douglas,  Harl  R.  Organization  and  Administration  of  Secondary  Schools. 
Boston:  Ginn  and  Company,  1932,  p.  396. 

10.  Ibid. 

11.  That  iS,  the  results  should  be  the  same  if  the  tests  were  rescored  by  (1) 
the  same  persons,  or  (2)  by  different  persons. 


TESTING  PROGRAMS 


S 

pointing  out  the  benefits  and  cure-all  qualities  of  this  new 
plan  if  I were  not  hampered  by  the  facts.”  (He  had  found  out 
through  research  that  his  plan,  was  not  as  effective  as  he  had 
hoped  it  would  be.) 

Measurement  and  Objectives.  Briggs  presents  a challenge 
to  research  and  to  secondary  education,  in  his  discussion  of  the- 
increasing  popularity  of  the  secondary  school: 

“The  very  success  of  which  we  boast  increases  our  obligation  to  inquire- 
as  to  the  fundamental  program  and  to  measure  the  results  more  carefully  in- 
terms  <af  t.h<s  purposes  for  which  schools  are  established  and  supported”  (12). 

Here  is  set  forth  the  thesis  on  which  measurement  of  achieve- 
ment should  be  based: 

“Measure  the  results  more  carefully  in  terms  of  the  pur- 
poses for  which  schools  are  established  and  supported.” 

Measurement  in  the  secondary  school  has  far  to  develop 
before  it  will  measure  results  in  terms  of  the  purposes  of  edu- 
cation. It  is  much  easier  to  measure  facts  and  information 
which  have  been  acquired  than  it  is  to  test  whether  objectives 
have  been  met. 

Another  value  of  measurement  is  that  it  may  furnish  more 
exact  definition  of  the  objectives  of  education.  Sangren  empha- 
sizes: 

“That  any  attempt  at  measurement  in  fields  of  instruction  tends  to  compel 
us  to  specify  much  more  definitely  what  the  objectives  of  instruction  are  and 
to  confine  us  to  the  sort  of  teaching  which  will  enable  us  to  realize  more 
definitely  Some  particular  aims” (13). 

Need  for  Measurement.  From  the  previous  discussion  on 
the  rapidity  of  the  development  of  measurement  and  the  ex- 
tent to  which  it  has  spread,  there  is  evidence  that  it  is  occupy- 
ing an  important  place  in  secondary  education.  The  relation 
of  measurement  to  the  evaluation  of  instruction,  provision  for 
guidance,  recognition  of  individual  differences,  conduct  of 
supervision,  diagnosis  of  personality,  improvement  of  mark- 
ing, evaluation  of  research,  and  determination  of  the  extent 
to  which  objectives  have  been  obtained,  all  tend  to  stress  its 
importance  in  the  program  of  secondary  education. 

12.  Briggs,  Thomas  H.  “Jeremiah  Was  Right.”  Teachers  College  Record, 
32:679-695,  May,  1931. 

13.  Sangren,  Paul  V.  “The  Present  Status  of  Measurement  in  the  Social 
Studies.”  The  Historical  Outlook,  21:279-?S3.  October,  1930. 


IMPORTANCE  OF  TESTING  t 

Plan  Followed  in  This  Study 

The  material  for  this  study  represents  the  testing  practices 
of  a large  number  of  secondary  schools  throughout  the  coun- 
try. This  material  was  obtained  by  means  of  check  lists  sent 
to  twelve  hundred  forty  secondary  school  principals.  Of  these, 
four  hundred  ninety-three  returned  the  lists  in  time  to  be 
included  in  the  study.  These  replies  came  from  schools  with 
enrolments  ranging  from  47  to  over  6,000,  from  communi- 
ties of  less  than  a thousand  to  New  York  City,  and  from 
schools  in  each  of  the  forty-eight  states.  A large  number  of 
forms,  bulletins,  tests,  and  further  descriptions  of  testing  pro- 
grams accompanied  the  returned  check  lists. 

A more  intensive  study  was  made  in  seventy  of  the  schools, 
where  each  teacher  was  asked  to  fill  out  a special  check  list. 
The  testing  practices  of  some  sixteen  hundred  teachers  was 
thus  sampled.  The  data  from  these  replies  are  also  introduced 
when  they  are  pertinent  to  the  topic. 

In  addition  to  practice,  a large  amount  of  the  increasing 
literature  on  the  questions  under  discussion  has  been  present- 
ed and  related  to  the  practice.  Wherever  possible,  the  litera- 
ture either  represents  the  experimental  evidences  or  summa- 
ries of  such  evidence.  Opinions  of  workers  and  writers  in 
measurement  or  related  fields  have  been  included  whenever 
such  opinions  add  to  the  value  of  the  presentation. 

An  attempt  has  been  made  to  assemble  the  materials  in 
such  a form  that  they  will  be  as  usable  as  possible  for  those 
who  are  responsible  for  the  testing  program  in  the  secondary 
schools.  It  is  hoped  that  the  method  of  presentation  Used  will 
make  the  material  of  value  not  only  to  administrators  but  also 
to  teachers. 

A complete  understanding  of  the  problems  discussed  is  best 
arrived  at  when  the  background  of  the  testing  movement  is 
understood.  For  that  reason,  the  following  chapter  deals  with 
the  historical  development  of  testing.  Following  the  historical 
treatment,  Part  II  includes  a discussion  of  the  problems  in- 
volved in  the  administration  of  testing. 


10 


TESTING  PROGRAMS 


SELECTED  REFERENCES 

References  listed  in  the  footnotes  are  not  necessarily  duplicated  in  the 
selected  references. 

Freeman,  Frank  N.  Mental  Tests.  Boston:  Houghton  Mifflin  'Company,  1926, 
Chapter  I. 

A brief  introduction  to  mental  tests. 

Hildreth,  Gertrude  H.  Psychological  Service  for  School  Problems.  Yonkers-on- 
Hudson,  New  York:  World  Book  Co.,  1930,  Chapter  III. 

Discusses  the  value  of  various  types  of  tests. 

Kelley,  Truman  Lee.  Interpretation  of  Educational  Measurements.  Yonkers-on- 
Hudson,  New  York:  World  Book  Company,  1927,  Chapter  II. 

Presents  purposes  served  by  educational  tests  and  discusses  certain  techni- 
cal requirements  necessary  for  the  fulfillment  of  these  purposes. 

Koos,  L.  V.  and  Kefauver,  G.  N.  Guidance  in  Secondary  Schools.  New  York: 
The  Macmillan  Company,  1932,  Chapters  X and  XI. 

Discusses  the  importance  of  measurement  in  guidance. 

Lang,  Albert  R.  Modern  Methods  in  Written  Examinations.  Boston:  Houghton 
Mifflin  Company,  1930,  Chapter  II. 

A discussion  of  the  functions  of  examinations. 

McCall,  Wm.  A.  How  to  Measure  in  Education.  New  York:  The  MacMillan 
Company,  1922,  Chapter  I. 

Statement  of  some  fourteen  theses  outlining  the  place  of  measurement  in 
education. 

Odell,  C.  W.  Educational  Measurement  in  High  School.  New  York:  The  Century 
Company,  1930,  Chapter  I. 

A discussion  of  the  need  of  measurement  in  education. 

Ruch,  G.  M.  The  Objective  or  New-Type  Examination.  Chicago:  Scott,  Fores- 
man  and  Company,  1929,  Chapter  I. 

Discusses  briefly  the  functions  and  kinds  of  examinations. 

Ruch,  F.  M.  The  Objective  or  New-Type  Examination.  Chicago:  Scott,  Fores- 
struction.  Yonkers-on-Hudson,  New  York:  World  Book  Company,  19i27,  Chap- 
ters II  and  III. 

Contains  a good  discussion  of  the  uses  and  limitations  of  tests  in  the  second- 
ary schools. 

Russell,  Charles.  Standard  Tests.  Boston:  Ginn  and  Company,  ISbO,  Chapters 
I and  II. 

A discussion  of  the  importance  and  background  of  measurement. 
Symonds,  Percival  M.  Measurement  in  Secondary  Education.  New  York:  The 
Macmillan  Company,  1927,  Chapter  I and  II. 

Discussses  need  and  methods  of  improving  measurement  in  the  high  school. 
Tiegs,  Ernest  W.  Tests  and  Measurements  for  Teachers.  Boston:  Houghton  Mif- 
flin Company,  1931,  Chapter  I. 

A valuable  presentation  of  the  funtion  of  measurement. 


CHAPTER  II 

DEVELOPMENT  OF  THE  TESTING  IN  THE  HIGH  SCHOOL 

1.  Early  Beginnings 

The  testing  movement  as  we  know  it  today  has  followed 
two  distinct  lines  of  growth.  One  line  of  development  has 
come  about  through  the  work  of  psychologists  primarily  inter- 
ested in  the  measurement  of  mental  capacity,  while  the  other 
is  the  product  of  workers  interested  in  the  measurement  of 
educational  achievements.  These  two  developments,  though 
separate  at  the  beginning,  have  more  recently  been  fused  and 
are  developing  together.  Both  movements  began  to  be  active 
about  1895,  although  the  writings  of  Horace  Mann(l)  in  1845 
and  Reverend  George  Fisher  about  1864(2)  are  illustrations 
of  earlier  work  in  the  held  of  educational  measurement.  How- 
ever, they  seem  to  have  had  no  later  influence. 

The  beginnings  of  achievement  testing  are  credited  to  Dr. 
J.  M.  Rice  who  in  1897  published  an  article(3)  based  upon 
results  obtained  from  testing  the  spelling  abilities  of  pupils 
in  several  cities.  It  was  eleven  years  later  that  the  first  stan- 
dardized achievement  test  was  published. (4 ) 

2.  Mental  Testing 

Mental  testing  was  first  undertaken  in  this  country  by  Cat- 
tell ( 5 ) who  was  influenced  by  Gabon’s  work  on  individual 
differences.  Through  the  work  of  Cattell,  Thorndike  became 

1.  Mann,  Horace.  Comments  on  the  Report  of  the  Examining  Committee  of 
the  Boston  Grammar  and  Writing  Schools  appearing  in  The  Common  School 
Journal,  7:330,  November  7,  1845. 

2.  Chadwick',  E.  B.  “Satistics  of  Educational  Results.”  The  Museum,  A 
Quarterly  Magazine  of  Educational  Literature  and  Science,  3:479-84,  January, 
1864. 

3.  Rice,  J.  M.:  “The  Futility  of  the  Spelling  Grind.”  The  Forum,  23:163-179 
409-419,  April,  June,  1897: 

4.  Stone,  C.  W.  Arithmetical  Abilities  and  Some  Factors  Determining  Them. 

New  York:  Teachers  College  Bureau  of  Publications,  Columbia  University, 
Contribution  to  Education.  Number  19,  1908,  p.  101. 

5.  Cattell,  J.  McKeon  “Mental  Tests  and  Measurements.”  Mind,  15:373-380, 
July,  1890. 


12 


TESTING  PROGRAMS 


interested  in  the  field  and  in  1904  published  the  first  book(6) 
dealing  with  educational  measurement.  This  book  remained 
the  standard  for  over  ten  years  and  its  influence  on  the  begin- 
nings of  educational  measurement  was  marked. 

A year  later  in  1905,  Binet  working  with  Simon  published 
the  first  edition  of  the  now  famous  method  of  measuring  intel- 
ligence of  an  individual. (7)  This  scale  was  revised  by  Binet 
in  1908  and  again  in  1911.  The  scale  was  first  used  in  the 
United  States  by  Goddard,  who  used  a translation,  in  testing 
some  four  hundred  feeble-minded  children.  (8)  Some  of 
the  revisions  of  Binet’s  scale  for  use  in  this  country  have 
been  made  by  Goddard (9),  Kuhlmann(lO),  Terman  and 
Childs(ll),  Huey(12),  Yerkes,  Bridges  and  Hardwick(13), 
and  Herring.  (14) 

Not  until  the  appearance  of  Terman’s  work  in  1916  did  the 
individual  test  of  intelligence  become  at  all  common.  As  The 
Stanford  Revision  of  the  Binet-Simon  Scale  it  became  the 
standard  and  has  since  been  used  in  the  public  schools  almost 
exclusively.  This  popular  scale  is  at  present  being  revised  and 
restandardized. 

Mental  testing  became  popular  with  the  introduction  of  the 
technique  for  testing  pupils  in  groups.  This  method  of  group 


6.  Thorndike,  E.  L.  An  Introduction  to  the  Theory  of  Mental  and  Social 

Measurements.  New  York:  Teachers  College,  Bureau  of  Publications,  Colum- 
bia University,  1904,  p.  277.  (Revised  Edition  1913). 

7.  Binet,  A.  et  Simon  T.  “Methodes  Novelles  pour  le  Diagnostic  de  Niveau 
Intellectual  des  Anormaux.  ’ L’Annee  Psychologique,  11:191-244,  1905. 

8.  Goddard,  H.  H.  “Four  Hundred  Feeble-Minded  Children  Classified  by 
the  Binet  Method.”  Pedagogical  Seminary,  17:387-397,  September,  li>10. 

9.  Goddard,  H.  H.  “A  Revision  of  the  Binet  Scale.”  Training  School  Bulletin, 
8:56-62,  June,  1911. 

10.  Kuhlmann,  F.  “A  Revision  of  the  Binet-Simon  System  for  Measuring  the 
Intelligence  of  Children.”  Journal  of  Psycho -Asthomics,  Monograph  Supple- 
ment, Vol.  I,  No.  I,  September,  1912,  41  p 

Kuhlmann,  F.  A Handbook  of  Mental  Tests:  A Further  Revision  and  Ex- 
tension of  the  Binet  Scale,  Baltimore:  Warwick  and  York,  1922,  208  pp. 

11-  Terman,  L.  M.  and  Childs,  H.  G.  “A  Tentative  Revision  and  Extension  of 
the  Binet-Simon  Measuring  Scale  of  Intelligence.”  Journal  of  Educational 
Psychology,  3:61-74,  133-43,  198-208,  277-89,  February,  March,  Aoril,  May, 
1912. 

Terman,  L.  M.  The  Measurement  of  Intelligence,  Boston:  Houghton  Mifflin 
Company,  1916,  363  pp. 

12.  Huey,  E.  B.  A Syllabus  for  the  Clinical  Examination  of  Children  with 
the  Revised  Binet-Simon  Scale  for  the  Measurement  of  Intelligence,  Baltimore: 

Warwick  and  York,  1912,  45  pp. 

13.  Yerkes,  R.  M„  Bridges,  J.  W.  end  Hardwick.  R.  S.  A Point  Scale  for 
Measuring  Mental  Ability,  Baltimore:  Warwick  and  York,  1915,  218  pp. 

14.  Herring,  John  P.  Herring  Revision  of  the  Binet-Simon  Test:  Examination 
Manual,  Form  A,  Yonkers-on-Hudson,  New  York:  World  Book  Company,  1922, 
56  pp. 


DEVELOPMENT  OF  TESTING 


13 


intelligence  testing  was  first  developed  in  the  army  and  is 
discussed  briefly  in  the  section  on  contributing  agencies. 

3.  Achievement  Testing 

While  development  was  taking  place  in  the  individual  test- 
ing of  intelligence,  much  was  being  done  in  the  testing  of 
groups  of  pupils  by  standardized  achievement  tests.  This  work 
in  the  construction  of  educational  tests  was  largely  being  done 
by  Thorndike  and  his  students.  In  1908  the  first  standardized 
achievement  test,  in  arithmetic  reasoning,  was  published  by 
Stone  (15),  a student  of  Thorndike.  The  following  year  Court- 
is published  his  Arithmetic  Test  Series  A.  Thorndike  in  1910 
published  his  Handwriting  Scale  (16).  For  the  next  few  years 
tests  were  slowly  becoming  more  numerous  but  practically 
all  were  in  the  elementary  field. 

The  beginning  of  high  school  tests  is  fairly  well  marked.  In 
1916,  Starch  in  his  volume  Educational  Measurement  (17) 
published  high  school  tests  in  Grammar,  Latin,  German, 
French  and  Physics.  During  the  same  year  Trabue’s  Comple- 
tion-Test Language  Scales  (18),  Hanus’  Latin  Tests  (19), 
Rugg’s  Tests  in  First-year  Algebra  (20),  Stockard  and  Bell’s 
Geometry  Test (21),  and  Thorndike’s  Scale  Alpha  2 for  Meas- 
uring the  Understanding  of  Sentences (22)  all  appeared.  Ac- 
cordingly, 1916  can  be  taken  as  the  date  when  standardized 
achievement  tests  were  first  available  for  general  use  in  the 
high  school.  Since  that  time  the  number  of  high  school  tests 
has  increased  very  rapidly.  In  1920  the  first  comprehensive 


15.  Stone,  C.  W.  op.  cit.  See  footnote 

16.  Thorndike,  E.  L.  “Handwriting  Scale  ” Teachers  College  Record,  11:1-93. 
March,  1930. 

17.  Starch,  Daniel  Educational  Measurements,  New  York:  The  MacMillan 
Company,  1916,  202  p. 

18.  Trabue,  M.  R.  Completion-test  Language  Scales,  New  York:  Teachers 
College,  Bureau  of  Publications,  Columbia  University,  Contributions  to  Edu- 
cation, No.  77,  1916,  118  p. 

19.  Hanus,  P.  H.  “Measuring  Progress  in  Learning  Latin.”  School  Review, 
24:342-345,  May,  1916. 

20.  Rugg,  H.  G.  The  Experimental  Determination  of  Standards  in  First-Year 
Algebra.”  School  Review  24:342-345,  May  1916. 

21.  Stockard,  L.  V.  and  Bell,  J.  Carleton  “A  Preliminary  Study  of  the  Meas- 
urement of  Abilities  in  Geometry.”  Journal  of  Educational  Psychology,  7:567-80. 
December,  1916. 

22.  Thorndike,  E.  L.  “An  Improved  Scale  for  Measuring  Ability  in  Reading.” 
Teachers  College  Record,  16:445-67;  17:40-67,  November,  1915,  January,  1916. 


14 


TESTING  PROGRAMS 


bibliography  was  published  by  Monroe  and  listed  one  hundred 
four  tests  for  use  in  the  high  school  (23). 

Odell’s  Educational  Measurement  in  High  School (24)  pub- 
lished in  1930  contains  information  on  about  278  tests.  Table 
I,  in  presenting  copyright  dates  of  published  tests  now  avail- 
able, records  435  tests  for  use  in  the  high  school.  Including 
all  tests  which  are  suitable  for  secondary  schools,  there  is  a 
total  of  over  610  tests.  There  has  indeed  been  a most  remark- 
able development (25)  in  the  last  sixteen  yars. 

4.  Contributing  Agencies 

Contributing  to  this  rapid  growth  have  been  the  testing 
done  in  the  army,  the  survey  movement,  the  establishment  of 
research  bureaus  in  school  systems,  and  the  books  written 
on  the  subject.  It  is  difficult  to  estimate  the  influeunce  which 
each  of  the  factors  has  had  because  of  the  interaction  of  these 
various  factors.  However,  a brief  description  of  their  contact 
with  the  development  of  the  measurement  program  will  be 
considered  here. 

Testing  in  the  army  was  probably  the  largest  single  factor 
in  popularizing  testing.  It  not  only  developed  techniques  and 
instruments  whereby  large  numbers  could  be  given  intelli- 
gence tests  at  one  time,  but  it  also  popularized  and  familiar- 
ized people  with  testing. 

Immediately  following  the  close  of  the  war,  Otis  published 
the  first  group  intellignce  test (26)  for  usa  in  the  schools. 

Within  the  next  three  years  appeared  the  tests  which  have 
received  most  use  in  the  secondary  schools,  the  Terman  Group 
Test  of  Mental  Ability,  National  intelligence  Tests,  and  Otis 
Self-Administering  Tests  of  Mental  Ability. 

The  first  survey  to  use  achievement  tests  was  the  one  of 

23.  Monroe,  Walter  S.  “A  Bibliography  of  Standardized  Tests  for  the  High 
School.”  Journal  of  Educational  Research,  1:151-153,  22^-242.  311-320,  February, 
March,  April,  1920. 

24.  Odell,  C.  W.  Educational  Measurement  in  High  School,  New  York:  The 
Century  Company,  1930,  641  p. 

25.  This  development  is  even  greater  than  the  610  tests  would  indicate 
because  those  not  commercially  available  have  been  omitted  from  Table  I. 
Only  fifty-one  of  the  tests  included  in  the  hundred  and  four  titles  of  Monroe’s 
first  bibliography  are  included  in  Table  I. 

26.  Otis,  S.  A.  “An  Absolute  Point  Scale  for  the  Group  Measurement  of 
Intelligence.”  Journal  of  Educational  Psychology,  9:233-  61,  333-48,  May,  June, 
1918. 


DEVELOPMENT  OF  TESTING 


15 


New  York  City  schools.  (27;)  3ir-ce,  that  time  a goodly,  number 
of  surveys  have  used  tests  in  some  form.  Eells-,  writing  in  1929, 
reports  that  “An  examination  of  some  200  published  reports 
of  surveys.... showed  that  72  of  these  made  more  or  less  exten- 
sive use  of  standard  intelligence  or  achievement  tests.”  (28) 
Of  this  number  by  far  the  largest  use  was  in  the  elementary 
field;  but,  of  the  strictly  high  school  subjects,  Latin  was  tested 
eighteen  times,  ranking  eight  in  the  list  of  all  subjects  tested, 
while  algebra  was  tested  seventeen  times,  French  four  times, 
physics  three,  geometry  and  Spanish  each  twice,  and  chem- 
istry and  general  science  once. 

The  first  research  bureau  was  established  in  Baltimore  in 
1912  and  was  known  as  the  Bureau  of  Statistics.  The  function 
of  the  bureau,  according  to  Chapman,  was  to  “make  adminis- 
trative studies  at  the  request  of  the  Board  or  Superinten- 
dent.” (29)  The  first  city  research  bureau  whose  organization 
was  traceable  directly  and  immediately  to  the  testing  move- 
ment was  at  Leavenworth  Kansas.  Chapman  says,  “A  Depart- 
ment of  Tests  and  Measurements  was  organized  on  July  6, 
1914,  at  Leavenworth,  Kansas.  It  was  established  for  the  pur- 
pose of  measuring  more  accurately  the  efficiency  of  the  school 
system,  and  of  discovering  weaknesses  in  the  system  by 
objective  standards.”  (30) 

Research  departments  developed  rapidly  from  that  time  to 
the  present.  Chapman  (31)  reports  sixty-nine  city  school  re- 
search bureaus  in  1927  and  forty-eight  others  such  as  state 
and  university  bureaus.  Wright  (32)  lists  some  118  city  school 
bureaus  in  1931  and  later  in  1931  one  hundred  and  fifty-two 
city  bureaus  are  given  (33)  with  an  additional  eighty-eight 
research  bureaus  in  state  departments,  colleges,  universities, 

27.  “Final  Report  of  the  Committee  on  School  Inquiry,  Board  of  Estimate 
and  Apportionment.”  New  Yorld  City:  The  Committee,  1911-1913,  3 vols. 

28.  Eells,  Walter  Crosby  “Use  of  Standard  Tests  in  72  Published  School 
Surveys,”  School  Life,  14:168-169,  May,  1929. 

2£>.  Chapman,  Harold  B.  Organized  Research  in  Education  with  Special  Refer- 
ence to  the  Bureau  of  Educational  Research.  Columbus,  Ohio:  Ohio  State  Uni- 
versity Studies,  Bureau  of  Educational  Research  Monograph,  November  7. 
1929,  p.  90. 

30.  Ibid.  p.  71. 

31.  Ibid.  p.  19. 

32.  Wright,  Edith  A.  Organization  and  Functions  of  Research  Bureaus  in 
City  School  Systems.  Washington,  D.  C.:  Office  of  Education,  Leaflet  No.  2, 
February,  1931,  14  p. 

33.  Educational  Directory,  1931.  Washington,  D.  C.:  Office  of  Education  Bulle- 
tin, 1931,  No.  1,  pp.  156-162. 


283782 


TESTING  PROGRAMS 


W 

teachers  colleges,  nGmval  schools,  and  others.  Not  all  of  these 
two  hundred  forty  bureaus,  are  doing  the  type  o£  research, 
requiring  testing  but  by  far  the  larger  of  the  number  are.  The 
main  contribution  of  these  bureaus  to  the  testing  movement 
has  been  in  the  training  of  teachers  in  the  use  of  tests  and 
encouraging  the  construction  of  tests. 

The  books  which  have  contributed  to  growth  in  this  field 
are  numerous.  Park (34)  presents  an  analysis  in  1931  of  some 
75  books  on  the  subject  of  mental  or  achievement  tests  show- 
ing that  47  per  cent  were  written  between  1925-1930.  Of  the 
important  books  appearing  before  1918,  Monroe (35)  lists 
Thorndike’s  Introduction  to  the  Theory  of  Mental  and  Social 
Measurements (36)  (1904),  Whipple’s  Manual  of  Mental  and 
Physical  Tests  (37)  (1910),  Starch’s  Educational  Measure- 
ment(38)  (1916),  Part  I of  the  Fifteenth  Yearbook  of  the 
National  Society  for  the  Study  of  Education(39)  (1916) , Mon- 
roe, DeVoss  and  Kelly’s  Educational  Tests  and  Measure- 
ments (40)  (1917),  and  Rugg’s  Statistical  Methods  Applied  to 
Education (41)  (1917). 

Since  1918,  among  the  outstanding  books  are  Part  II  of  the 
Seventeenth  Yearbook  of  the  National  Society  for  the  Study 
of  Education  (42)  (1918),  Terman’s  Intelligence  of  School 
Children(43)  (1919),  McCall’s  How  to  Measure  in  Educa- 
tion (44)  (1922),  Pintner’s  Intelligence  Testing(45)  (1923), 


34.  Park,  Maxwell  G.  Training  in  Objective  Educational  Measurements  for 
Elementary  School  Teachers.  New  York:  Teachers  College,  Columbia  Uni- 
versity, (Contributions  to  Education  No  520)  1932,  100  pp. 

35.  Monroe,  Walter  S.  et  al.  Ten  Years  of  Educational  Research,  1918-1927. 
Urbana:  University  of  Illinois  Bulletin,  Vol.  25,  No.  51,  Bureau  of  Educational 
Research  Bulletin  No.  42,  1928,  pp.  £5-96. 

36.  Thorndike,  E.  L.  op.  cit.  see  fool  note  6. 

37.  Whipple,  G.  M.  Manual  of  Mental  and  Physical  Tests.  Baltimore:  War- 
wick and  York,  1910,  534  p.  (Revised  edition  1914,  Part  I,  365  p.  Part  II,  336  p.) 

38.  Starch,  Daniel,  on.  cit.  see  footnote  7. 

39.  Strayer,  George  D.  et  al.  “Standards  and  Tests  for  the  Measurement  of 
the  Efficiency  of  Schools  and  School  Systems.”  Fifteenth  Yearbook  of  the 
National  Society  for  the  Study  of  Education.  Chicago:  Universiy  of  Chicago 
Press,  1916,  172  p. 

40.  Monroe,  W.  S.,  DeVoss,  J.  C.  and  Kelly,  F.  J.  Educational  Tests  and  Meas- 
urements. Boston:  Houghton,  Mifflin  Company,  1917,  309  pp.  (Revised  edition, 
1924,  541  p.) 

41.  Rugg,  H.  0.  Statistical  Methods  Applied  to  Education,  Boston:  Houghton, 
Mifflin  Company,  1917,  410  pp. 

42.  Courtis,  S.  A.  et  al.  “The  Measurement  of  Educational  Products.”  Seven- 
teenth Yearbook  of  the  National  Society  for  the  Study  of  Education,  Part  II. 
Bloomington,  Illinois:  Public  School  Publish  Company,  1918,  192  p. 

43.  Terman,  L.  M.  Intelligence  of  School  Children.  Boston:  Houghton,  Mifflin 
Company,  1919,  317  p. 


DEVELOPMENT  OF  TESTING 


Tv 

Freeman’s  Mental  Tests  (46)  (1926),  and  Kelley’s  Interpreta* 

tion  of  Educational  Measurements  (47)  (1927). 

In  1927  the  first  books  dealing  exclusively  with  measure* 
ment  in  the  high  school  appeared.  These  were  Symond’s  Meas- 
urement in  Secondary  Education  (48)  and  Ruch  and  Stod- 
dard’s Tests  and  Measurements  in  High  School  Instruc- 
tion (49),  and  three  years  later  Odell’s  Educational  Measure- 
ment in  High  School  (50)  (1930), 

5.  New-Type  or  Objective  Tests 

Another  phase  of  the  measurement  program  which  should 
be  considered  is  that  of  the  new-type  objective  tests.  McCall 
is  credited  with  being  the  first  to  write  on  the  new-type  test. 
At  the  end  of  the  article  in  which  he  first  describes  the 
true-false  test  he  remarks,  “Most  of  these  claims  rest  upon 
logical  probability  and  a limited  experience  and  not  Upon 
experimental  data.  This  last  is  needed  and  will  follow  in 
time”  (51).  His  prediction  was  more  than  realized  in  the  hun- 
dreds of  articles  and  large  amount  of  “experimental  data” 
which  followed  in  the  next  ten  years.  Most  of  the  early  materi- 
al was  published  in  magazine  articles  but  several  volumes 
dealing  solely  with  the  new-type  examinations  have  appeared, 
among  which  are  Ruch’s  Improvement  of  the  Written  Exami- 
nation52)  (1924),  and  The  Objective  or  New-Type  Examina- 
tion(53)  (1929),  Russell’s  Classroom  Tests(54)  (1926), 
Odell’s  Traditional  Examinations  and  New-Type  Tests(55) 


44.  McCall,  W.  A.  How  to  Measure  in  Education.  New  York:  The  MacMillan 
Company,  1922,  416  p. 

45.  Pintner,  Rudolph.  Intelligence  Testing.  New  York:  Henry  Holt  and  Com- 
pany, 1923,  New  Edition,  1931,  555  p. 

46.  Freeman,  Frank  N.  Mental  Tests.  Boston:  Houghton,  Mifflin  Company, 

1926,  503  p. 

47.  Kelley,  Truman  L.  Interpretation  of  Educational  Measurements.  Yonkers- 
on- Hudson,  New  York:  World  Book  Company,  1927,  363  p. 

48.  Symonds,  Percival  M.  Measurement  in  Secondary  Education.  New  York: 
The  Macmillan  Company,  1927,  588  p. 

49.  Ruch,  G.  M.  and  Stoddard,  George  D.  Tests  and  Measurements  in  High 
School  Instruction.  Yonkers-on -Hudson,  New  York:  World  Book  Company. 

1927,  381  p. 

50.  Odell,  C.  W.  op.  cit.,  see  footnote  24. 

51.  McCall,  W.  A.  “A  New  Kind  of  School  Examination.”  Journal  of  Educa- 
tional Research,  1:33-46,  January,  1920,  p.  45. 

52  Ruch,  G.  M.  The  Improvement  of  the  Written  Examination.  Chicago:  Scott, 
Foresman  and  Company,  1924,  193  p. 

53-  The  Objective  or  New-Type  Examination.  Chicago:  Scott,  Fores- 

man and  Company,  1929,  478  p. 

54.  Russell,  Charles,  Classroom  Tests.  Boston:  Ginn  and  Company,  1926,  346  p. 

55.  Odell,.  C W,  Traditional  Examinations  and  New-Type  Tests.  New  York: 


18 


TESTING  PROGRAMS 


TABLE  I 

An  Analysis  of  the  Copyright  Dates  of  Tests  Suitable  for  Use  in  Grades  9-12 
Which  were  Available  in  1S32* 


Number  of  Tests  Published  Each  Year 


Subjects — Before  1920 1 

’20  | 

’21  1 

’22  j 

'23 

’24  |? 

’25  T 

’26 1 

’27 1 

'28  ] 

’29 

’30 

’31 

1*32  | 

tot; 

English 

9 

4 

6 

4 

9 

2 

9 

10 

13 

14 

8 

7 

4 

1 

10' 

Composition  

3 

1 

1 

1 

1 

1 

2 

1( 

Lang.,  Gram 

4 

1 

1 

1 116 

2 

2 212 

5 

2 6 

1 212  2 

2 

4' 

Lit.  Poet 

1 

1 

2 

3 

2 1 

3 

1 

3 

2 

1 

2( 

Read.,  Voc. 

1 

1 

4 

2 

2 

1 

3 

1 

3 

Spelling  

1 

1 

1 

2 

1 

< 

Speech 

2 

l 

Foreign  Lang. 

5 

2 

3 

3 

4 

1 

6 

7 

5 

7 

6 

3 

3 

0 

5f 

Latin 

3 

1 

2 

2 1 

4 

1 

2 

2 1 

2 

1 

21 

French  

1 

1 

2 

2 

1 

2 

1 2 

1 1 

i; 

Spanish  

1 

1 

3 

3 

f 

German  ...  

1 

1 

1 

1 1 

c 

1 

Prognosis  

1 

1 

1 

2 

1 

f 

Mathematics  

3 

1 

1 

0 

0 

2 

1 

4 

3 

9 

5 

5 

3 

1 

3f 

Algebra  

1 

1 

1 

1 

1 

i i 

1 1 

1 1 

1 2 

14 

Geometry  

1 

1 

1 

3 

1 

1 2 j 2 

1 

1 

1 

If 

So.  Geom.  Tr 

3 

c 

Prognosis  

1 

1 

1 

1 

2 

e 

Social  Science 

0 

2; 

1! 

o 

0 

2 

1 

4 

3 

9 

5 

5 

3 

1 

3 

Amer.  Hist 

l 

1 

1 

1 

2 

1 

Europ.  Hist 

i 

1 

1 

*3 

*J 

World  Hist | 

| 

1 

| 

1 

1 

1 2 

5 

Civics  

2 

1 

3 

2 1 

1 2 

12 

Miscellaneous  .... 

2 

1 

1 

4 

Science  

1 

0 

1 

1 

2 

6 

1 

5 

3 

3 

9 

4 

7 

1 

44 

Gen.  Science 

1 

1 

1 1 

1 

1| 

j 

3 

1| 

10 

Chemistry  | 

1 

1 

1 | 

2 

2 

1 

i|i 

2I 

11 

Physics  

1 

1 

2 

1 

1 2 1 1 1 

1 1 

12 

Biology  

3 

1 

1 

2 211 

10 

Botany  

1 

1 

DEVELOPMENT  OF  TESTING 


39 


Table  I (Continued) 


Number  of  Tests  Published  Each  Year 


Subjects — Before  1920  j 

’20 

’21  | 

\22 

’23  | 

’24  1 

’25 

T26  | 

’27  | 

’28  | 

’29 

’30 

’31 

’32  | total 

Industrial  Arts 

1 

0 

1 

°l 

1| 

2 

0 

1 

4 

6 

5 

1 

2 

0 

24 

1 

Manual  Arts  

2 

3 

2 

7 

i 

Mech.  Drawing 

| 

1 

1 

3 

1 

1 

6 

1 

Home  Econ 

1 

1 

1 

2 

1 

2 

2 

1 

11 

r Commercial  

o 

o 

1 

2 

2 

2 

1 

4 

0 

2 

9 

6 

1 

1 

31 

Com.  Arith.  & 

Bus.  Training 

1 

1 

2 1 

5 

Bookkeeping  

1 

1 

1 1 

3 

1 

8 

Shorthand  

1 

2 

1 

1 

1 

7 

i 

Typewriting  

1 

1 

1 

3 

Com.  Law  

1 

3 

1 

5 

if 

Miscellaneous  .... 

1 

1 1 

3 

"ine  Arts  

1 

0 

0 

1 

0 

2 

0 

1 

2 

0 

2 

0 

0 

0 

9 

Music  | 

1 j 

| 

| 

2 

1 

1 

5 

Art  | 

1 

1 

| 

1 

! 

j 

1 

2 

3 

Freehand 

Drawing  

1 

1 

Miscellaneous  

1 

0 

1 

0 

2 

5 

3 

2 

3 

5 

3 

2 

1 

0 

28 

Agriculture  

4 

2 

1 

7 

,'j 

Health,  P.  E 

2 

1 

2 

1 

2 

1 

9 

Mech.  Ability 

1 

1 

1 

1 

1 

2 

1 

8 

J 

Ach.  Batteries... 

1 

1 

1 

1 

4 

'uoil  Rating 

4 

3 

1 

2 

2 

2 

2 

5 

6 

6 

2 

8 

10 

1 

54 

Character  

3 

1 

1** 

1 

2 

5 

1 

14 

Habits,  Attitudes 

2 

1 

1 

1 

2 

3 

3 

3 

1 

3 

2 

22 

Voc.  Guid 

1 

1 

1 

1 

1 

5 

Guidance  

1 

2 

1 

2 

1 

7 

Rating  Scales 

1 

1 

2 

1 

1 

6 

ntelligence  

7 

1 

1 

4 

0 

1 

0 

2 

2 

2 

0 

1 

0 

0 

21 

’OTALS 

(32 

13 

I17 

i18 

22 

27 

j 23 

i 45 

43 

|60 

[52 

41 

1 37 

5 

435 

The  numbers  in  black  are  sandardized  tests  while  the  numbers  in  bold 
ace  are  non- standardized  tests. 

includes  only  those  tests  now  commercially  available  and  those  for  which 
, opyright  dates  are  available. 

* ‘Refers  to  the  series  by  Hartshome  and  May. 


20 


TESTING  PROGRAMS 


(1928),  and  Orleans  and  Sealy’s  Objective  Tests  (56)  (1928). 

6.  Increase  in  Number  of  High  School  Tests 

An  excellent  picture  of  the  growth  of  the  testing  movement 
in  the  high  school  can  be  obtained  by  studying  the  analysis 
of  the  dates  of  publication  of  the  tests.  In  order  to  know  how 
many  tests  are  available  for  use  in  secondary  schools  today,  a 
card  file  was  constructed,  listing  all  commercially  available 
tests.  That  the  list  thus  compiled  is  absolutely  complete  is 
doubtful;  however,  it  does  include  practically  all  tests,  as  care- 
ful check  has  been  made  against  publishers’  catalogs,  available 
bibliographies  of  tests  (57)  and  the  files  of  Teachers  College 
Library. 

The  tests  are  listed  by  subjects  under  the  date  of  first  publi- 
cation in  Table  I.  Only  the  commercially  available  tests  suit- 
able for  use  in  grades  9-12  for  which  copyright  dates  were 
obtainable  are  included  here.  Tests  constructed  by  various 
research  departments  have  not  been  listed  unless  they  are 
made  available  through  publishers. 

TABLE  II 

An  Analysis  of  the  Development  of  High  School  Tests  in  Terms  of  the  Per  Cent 
of  Tests  in  Each  Subject  Distributed  According  to  Date  of  Publication* 


Per  Cent  of  Tests  Published  Each  Year 


Subjects 

| No.  of 
Tests 

Before 

1920 

’20 

’21 

’22 

’23 

’24 

’25 

’26 

’27 

’28 

^’29 

1 

’30 

’31 

32* 

English  

100 

9 

4 

6 

4 

9 

2 

9 

10 

13 

14 

8 

7 

4 

1 

Foreign  Lang 

55 

9 

4 

5 

5 

8 

2 

11 

13 

9 

13 

11 

5 

5 

0 

Science  

44 

2 

0 

2 

2 

5 

14 

2 

11 

7 

7 

21 

9 

16 

2 

Mathematics  

38 

8 

3 

3 

0 

0 

5 

3 

10 

8 

23 

13 

13 

8 

3 

Social  Science  

31 

0 

0 

6 

3 

0 

6 

0 

13 

6 

20 

10 

16 

20 

0 

Commercial  

31 

0 

0 

3 

7 

7 

7 

3 

13 

0 

7 

28 

19 

3 

3 

Industrial  Arts  

24 

4 

0 

4 

0 

4 

9 

0 

4 

17 

25 

20 

4 

9 

0 

Fine  Arts  

9 

11 

0 

0 

11 

0 

22 

0 

11 

22 

0 

22 

0 

0 

0 

Pupil  Rating  

54 

7 

5 

2 

4 

4 

4 

4 

9 

11 

11 

4 

15 

18 

2 

Miscellaneous  

28 

4 

0 

4 

0 

7 

17 

16 

7 

11 

17 

11 

7 

4 

0 

Intelligence  

21 

34 

5 

5 

19 

0 

5 

0 

9 

9 

9 

0 

5 

°| 

0 

Total 

435 

7 

3 

4 

4 

5 

6 

5 

10 

10 

14 

12 

10 

9 

1 

‘Includes  the  same  tests  as  are  included  in  Table  I.  The  figure  in  bold  face 
type  indicates  the  location  of  the  median  for  that  row. 

“Includes  only  the  first  two  months  of  1932. 


56.  Orleans,  J.  S.  and  Sealy,  G.  A.  Objective  Tests.  Yonkers-on-Hudson,  New 
York:  World  Book  Company,  1928,  373  p. 

57.  See  bibliography  at  end  of  Chapter  III. 


DEVELOPMENT  OF  TESTING 


21 


There  are  tests  for  practically  every  subject  taught  in  the 
Ugh  school.  Some  subjects,  such  as  English,  have  a large 
number  while  others,  such  as  fine  arts,  have  only  a few. 

It  is  rather  difficult  to  see  trends  in  Table  I so  Table  II  has 
been  constructed  giving  the  per  cent  of  tests  published  each 
year  for  each  subject  field  separately.  The  per  cent  in  bold  face 
type  indicates  the  year  which  is  the  mid-point  of  the  develop- 
ment of  the  tests  to  date.  Briefly  the  trends  as  shown  in  Table 
II  are: 

1.  Tests  in  English  developed  slowly,  reached  their  maximum  in  1927  and 

1928,  and  showed  a considerable  decline  by  1931.  Over  half  the  tests  in  this  field 
were  available  by  1926. 

2.  Foreign  language  shows  somewhat  the  same  tendency  as  English  except 
that  there  are  only  half  as  many  tests  printed. 

3.  Science,  ranking  third  with  44  tests,  showed  practically  no  development 
until  1924.  Other  high  spots  occurred  in  1926,  1929,  and  1931.  The  tests  issued 
in  1929  and  1931  were  largely  non-standardized,  instructional  type  of  tests. 

4.  Mathematics  received  a rather  good  start  before  1920  but  it  was  not  until 
1926  that  much  more  was  done.  The  high  spot  was  reached  in  1928  when  nearly 
a fourh  of  the  tests  now  available  were  published. 

5.  Social  science  is  another  subject  that  has  shown  increasing  activity  in  the 
last  four  years.  Two-thirds  of  the  tests  in  this  field  have  copyright  dates  of 
1928  or  later. 

6.  Commercial  tests  were  slower  in  appearing  than  were  the  social  studies 
but  showed  somewhat  more  even  trend.  The  high  points  were  1926,  1929,  and 
1930  with  a decided  slump  in  1931. 

7.  Most  of  the  tests  in  industrial  arts  were  published  between  1927  and 

1929. 

8.  Fine  arts  claim  relatively  few  tests  and  these  were  scattered  over  the 
the  entire  period. 

9.  Tests  for  rating  pupils  including  character,  attitude,  interest,  and  guidance 
measurements  have  been  gradually  developing.  An  increased  interest  is  noted 
for  such  measurement  in  1930  and  1931.  In  fact  it  is  the  only  field  in  which 
more  tests  were  published  in  1931  than  in  any  other  single  year  previously. 
Development  in  this  field  appears  to  have  only  begun.  Test  construction  will 
be  made  by  the  schools.  The  first  book  summarizing  the  studies  in  this  field, 
Symonds,  Diagnosing  Personality  and  Conduct(58),  was  first  available  in  Janu- 
ary, 1932. 

10.  Intelligence  tests  showed  their  greatest  development  in  the  period  between 
1918  and  1922.  There  have  been  comparatively  few  published  in  recent  years 
nor  does  there  appear  to  be  any  urgent  need  for  more  tests  in  this  field. 

A study  of  the  total  number  of  tests  published  each  year 
reveals  that  1928  was  the  peak  year  in  the  development  of 
high  school  tests  when  some  sixty  appeared.  Over  half  the 

58.  Symonds,  Percival  M.  Diagnosing  Personality  and  Conduct.  New  York: 
The  Century  Company,  1931,  602  p. 


22  TESTING  PROGRAMS 

tests  have  been  published  since  1927  when  the  first  two  vol- 
umes devoted  exclusively  to  high  school  measurement  were 
published,  that  of  Ruch  and  Stoddard  and  that  of  Sym- 
onds.(59) 

The  435  tests  by  no  means  represent  all  which  are  available 
for  secondary  schools.  There  are  an  additional  50  for  which 
it  was  impossible  to  obtain  copyright  dates.  Over  125  tests 
suitable  for  use  in  junior  high  schools  were  not  included.  This 
makes  a minimum  of  610  tests  which  are  available  for  use  in 
se  secondary  schools  at  present  and  this  number  has  been 
increasing  at  the  rate  of  over  50  a year. 

TABLE  III 

A Comparison  of  the  Development  of  Standardized  and 
Non-Standardized  Tests* 


Year  I Standardized  |Non-Standardizedj  Total 

Published  j No.  | % j No.  | % | No.  % 

Beforeh.920  .-. j 32  j V | j 32  7 

1920  .....|  13  3 | | 13  3 

1921  j 17  | 5 | j j 17  | 4 

1922  . 16  4 j 2 | 3 j 13  4 

1923  22  6 ] j j 22  | 5 

1924  . | 25  7 | 2 | 3 27  6 

1925  - [ 21  | 6 | 2 | 3 | 23  | 5 

1926  | 41  11  | 4 1 5 | 45  | 10 

1927  J 41  | 11  ) 4 | 5 | 45  | 10 

1928  j 46  i 13  14  | 20  | 60  j 14 

1929  . ........  j 33  j 9 | 10  | 26  | 52  ! 12 

1930  | 32  j 9 | 9 | 13  | 41  | 10 

1931  , 22  6 j 15  | 21  j 37  t 9 

1932**  | 3 | 1 2 j 3 j 5 | 1 

Total  , 364  100%  71  100%  435  100% 


*Non-Standardized  tests  include  printed  tests  which  do  not  have  some  type 
of  norms  and  also  the  type  of  test  '••now1'  ""  instructional  tests. 

’•Includes  only  the  first  two  months  of  1932. 

The  emphasis  has  been  changing  somewhat  toward  the  pub- 
lication of  instructional  and  unit  tests.  These  non-standardized 
tests  which  first  appeared  in  1928  in  marked  numbers  have  in- 
creased rather  steadily  since  then.  The  data  in  Tables  I and  III 
are  least  accurate  in  respect  to  these  non-standardized  tests  for 
it  is  not  only  difficult  to  distinguish  between  a test  book  and 
a work  book  but  the  publishers  are  more  widely  scattered  and 
the  information  more  difficult  to  obtain. 


59.  See  footnotes  48  and  49  for  exact  reference. 


DEVELOPMENT  OF  TESTING 


23 


7.  Improvement  in  Quality  of  High  School  Tests 

The  development  that  has  taken  place  in  testing  is  to  be 
measured  not  only  by  the  increase  in  the  number  of  tests  but 
also  by  the  increase  in  quality.  Striking  differences  can  be  seen 
in  most  subjects  when  the  early  tests  are  compared  with  the 
later  productions.  An  example  of  this  can  be  seen  in  reading 
tests  on  the  high  school  level  in  comparing  the  Thorndike-Mc- 
Call  Reading  Scale  (60)  published  in  1920  with  the  recent 
revised  edition  of  the  Iowa  Silent  Reading  Test  (61)  published 
in  1931. 

Read  this  and  then  write  the  answers.  Read  it  again  if  you  need  to. 

For  nearly  thirty  years  “Lewis  Carroll”  was  a lecturer  on  mathematics  at 
Oxford.  He  studied  divinity  and  occasionally  preached,  but  his  shy  and  retiring 
nature,  together  with  a tendency  to  stammer,  kept  him  from  the  regular 
ministry.  He  gave  many  lectures  to  audiences  made  up  mainly  of  children. 
These  lectures  were  of  various  sorts,  but  consisted  principally  of  narratives 
from  his  books  illustrated  by  lantern  pictures.  He  invented  a number  of  mathe- 
matical games. 

26.  Write  the  word  “book”  as  it  would  sound  when  spoken  by  a person  who 
stammers — . 

27.  What  study  is  mentioned  in  the  paragraph  which  is  a preparation  for 
the  regular  ministry? — - 

28.  Write  one  word  which  could  have  been  used  in  the  last  line  of  the 

paragraph  instead  of  a “number  of”  

Do  the  next  page 

Figure  I.  A Reproduction  of  Page  6 from  the  Thorndike-McCall  Reading 
Scale.  Form  I. 

The  Thorndike  McCall  Reading  Scale  consists  of  nine  para- 
graphs upon  which  thirty-five  questions  are  based.  The  pupil 
writes  in  the  answer  to  the  question.  An  example  of  a para- 
graph on  the  high  school  level  is  given  in  Figure  I.  Such  a 
test  is  not  only  difficult  to  score  but  confuses  a variety  of  read- 
ing skills.  Another  difficulty  is  in  the  small  number  of  items. 
For  instance,  if  a pupil  had  27  questions  correct,  he  would 
have  a grade  score  equivalent  to  the  beginning  of  the  eighth 
grade.  If  he  got  only  one  more  correct,  28,  he  would  be  con- 
sidered as  having  ninth  grade  reacting  ability  while  a score 
of  29  would  be  equivalent  to  twelfth  grade  ability.  This  is 
sufficient  to  show  that  such  a test  could  not  differentiate 

60.  Published  by  Teachers  College,  Bureau  of  Publications,  Columbia  Univer- 
sity, New  York. 

61.  Published  by  World  Book  Company,  Yonkers-on-Hudson,  New  York. 


TESTING  PROGRAMS’ 


between  high  school  pupils.  Also  that  a small  error  on  the:  part 
of  either  pupil  or  scorer  would  probably  make,  at  least  a year’s; 
difference  in  the  pupil’s  apparent  reading  ability. 

In  contrast  to  the  early  reading  test  described,  the  Iowa 
Silent  Reading  Test,  Revised,  illustrates  the  development, 
which  has  been  taking  place  in  the  theory  and  practice  of  test 
construction.  The  test  measures  four  major  aspects  of  silent, 
reading  ability  described  as  (1)  Comprehension,  (2)  Organiz- 
ation. (3)  Ability  to  Locate  Information,  and  (4.)  Rate  of 
Reading.  The  outline  of  the  test  is  as  follows: 

Test  1.  Paragraph  Meaning 

A.  Science 

B.  Literature 
Test  2.  Word  Meaning 

A.  Social  Science  Vocabulary 

B.  Science  Vocabulary 

C.  Mathematics  Vocabulary 

D.  English  Vocabulary 
Test  3.  Paragraph  Organization 

A.  Selection 

B.  Outlining 

Test  4.  Sentence  Meaning 
Test  5.  Location  of  Information 

A,  Use  of  the  Index' 

B.  Selection  of  Key  Words 
Test  6.  Rate  of  Silent  Reading 

it  would  be  difficult  to  give  extracts  from  the  test  as  the 
material  is  So  varied  that  to  gain  a complete  understanding 
of  it,  the  whole  test  would  have  to  be  reproduced. 

The  test  not  only  measures  these  varied  reading  skills  but 
contains  202  items,  not  including  the  rate  of  reading  test,  as 
compared  to  35  items  of  the  Thorndike  McCall.  The  difference 
between  eighth  grade  and  ninth  grade  ability  is  represented 
by  11  points  in  score  instead  of  by  1,  between  the  eighth  and 
twelfth  grade  by  52  points  instead  of  by  2. 

These  contrasting  examples  stress  the  points  at  which  the 
more  recent  tests  show  decided  improvement  over  the  earlier 
ones.  There  is  a distinct  tendency  to  provide  a more  diagnost- 
ic type  of  test.  Interest  has  swung  from  general  measures  of 
achievement  to  the  measurement  of  the  various  units  or  skills 
which  are  included  in  the  general  ability.  This  is  a distinct  step 
forward  for  it  makes  possible  remedial  and  corrective  teaching 
on  the  basis  of  test  results. 


“DEVELOPMENT  OF  TESTING 


.“a 

The  other  trend  is  toward  making  instruments  which  differ- 
entiate more  closely  between  individuals.  The  more  marked 
the  differentation  between  individuals  that  can  be  obtained, 
the  more  useful  are  the  results  for  guidance  and  instructional 
purposes.  This  differentiation  is  obtained  by  including  more 
items  and  by  having  the  test  cover  a smaller  range  of  subject 
matter.  A coincidental  advantage  is  usually  the  increase  in 
the  reliability  of  the  tests. 

8.  Summary 

A brief  historical  summary  of  the  development  of  mental 
and  achievement  tests  is  presented.  Standardized  testing  can 
be  said  to  have  begun  its  rapid  development  in  th  high  school 
in  1916.  Since  that  time  over  600  usable  secondary  school 
tests  have  been  published.  Development  has  not  only  taken 
place  in  numbers  but  in  quality.  Tests  now  tend  to  be  more 
diagnostic  and  to  differentiate  more  between  individuals.  The 
emphasis  needs  to  be  placed  on  developing  better  tests,  not 
more  tests. 


'SELECTED  REFERENCES 

Freeman,  Frank  N.  Mental  Tests.  Boston:  Houghton  Mifflin  Company,  192& 
Chapters  II -VI 

Contains  data  on  the  development  of  mental  tests. 

Kelley  Truman  Lee,  Interpretation  of  Educational  Measurements,  Yonkers-on- 
Hudson,  New  York:  World  Book  Company,  1927,  Chapter  I. 

Presents  a brief  statement  cf  tlie  historical  development  of  various  con- 
cepts and  terms  in  measurements. 

Odell,  C.  W.  Educational  Measurement  in  High  School.  New  York;  The  Century 
Company,  1930,  Chapted  II. 

A statement  of  the  development  of  testing, 

Peterson,  Joseph.  Early  Conceptions  and  Tests  of  Intelligence.  Yonkers-on- 
Hudson,  New  York:  World  Book  Cocpany,  1925,  320  p, 

Excellent  treatment  of  the  historical  development  of  intelligence  tests. 

Pintner,  Rudolph  Inelligence  Tests,  New  Edition.  New  York:  Henry  Holt  and 
Company,  1931,  Chapters  I-III. 

A treatment  of  the  historical  development  of  intelligence  tests. 

Ruch.  G.  M.  and  Stoddard,  G.  D.  Tests  and  Measurements  in  High  School 
Instruction.  Yonkers- on- Hudson,  New  York:  World  Book  Company,  1927 
Chapter  I. 

Brief  description  of  the  status  of  measurement  in  secondary  schools. 

Walker,  Helen  M.  Studies  in  the  History  of  Statistical  Method.  Baltimore; 
Williams  and  Wilkins  Company,  i929.  Chapter  VIII. 

Lists  the  origin  cf  certain  terms  used  in  measurement. 


PART  II 

ADMINISTRATION  OF  TESTING 


CHAPTER  III 

PLANNING  THE  PROGRAM  OF  STANDARDIZED  TESTS 

Introduction 

The  material  for  Part  II  has  been  obtained  from  replies  to 
check  lists  sent  to  secondary  school  administrators,  from  bulle- 
tins, reports  and  descriptions  of  programs  returned  with  the 
check  lists,  and  from  a study  of  the  literature,  experimental 
and  otherwise,  bearing  on  these  topics.  The  check  lists  were 
sent  to  a sampling  of  1240  secondary  school  principals  in  com- 
munities of  various  sizes  in  all  of  the  forty-eight  states.  Re- 
turns were  obtained  from  40%  of  the  principals  in  61%  of  the 
cities.  A further  analysis  showed  that  97%  of  the  cities  over 
100,000,  697°  of  the  cities  between  25,000  and  100,000,  and 
52%  of  the  towns  below  25,000  to  which  lists  were  sent  are 
represented  in  the  final  analysis  and,  further,  a post  card 
follow-up  indicates  that  the  practices  reported  for  large  cities 
are  substantially  accurate  but  that  small  towns  on  the  average 
do  slightly  less  testing  than  is  indicated  here. 

The  topics  discussed  in  this  section  are: 

1.  What  types  of  standardized  tests  are  available  for  use? 

2.  What  types  of  testing  programs  are  the  schools  conducting? 

3.  Who  should  be  responsible  for  the  selection  of  tests? 

The  administration  of  intelligence,  standardized  achieve- 
ment, and  teacher-made  tests  are  discussed  in  later  chapters. 

1.  What  Types  of  Standardized  Tests  Are  Available  for  Use? 

The  first  step  in  planning  the  testing  program  of  a secon- 
dary school  is  to  decide  upon  the  purpose  which  the  testing 
should  serve.  The  only  excuse  for  a testing  program  of  any 
kind  is  in  terms  of  the  use  made  of  the  results.  H.  G.  Pratt 


PLANNING  THE  PROGRAM 


27 


bungs  this  out  forcibly  in  her  statement  that  “the  only  justifi- 
cation of  testing  programs  is  better  education  for  the  children 
tested(l).” 

Testing  is  important  in  the  secondary  school  but  it  is  import- 
ant only  so  far  as  the  results  effect  desirable  changes,  however 
slight,  in  the  educational  process.  It  is  necessary  to  understand 
where  measurement  can  contribute  something  of  value  to  the 
school  situation  (Chapter  I)  and  then  outline  the  specific 
situations  to  which  the  tests  will  contribute.  Briefly,  tests  are 
valuable  instruments  in  guidance,  classification  and  promo- 
tion, supervision  and  improvement  of  instruction. 

The  second  step  is  to  decide  what  kinds  or  types  of  tests  are 
most  suitable  for  the  needs  to  be  met.  The  concensus  of  opin- 
ion is  that  a well  planned  testing  program  will  use  both 
teacher-made  tests  and  standardized  tests.  It  is  the  purpose  of 
this  chapter  to  discuss  only  the  standardized  tests;  the  teacher- 
made  tests  are  discussed  in  Chapter  VI. 

TABLE  IV 


Per  Cent  of  Junior,  Senior  and  Six  Year  High  Schools 
Using  Various  Types  of  Standardized  Tests* 


Tests  Used 

Junior 

Senior 

Six  Year  | Total 

Intelligence  

82 

74 

91 

80 

Achievement  

78 

60 

89 

70 

Aotitude  or  Prognosis  

10 

24 

33 

23 

Rating  Scales 

8 

11 

17 

12 

Character  Tests  

2 

3 

5 

I 3 

Number  of  Cases  

123 

276 

94 

493 

*The  per  cents  for  this  table  were  derived 

from  the  number 

checking  the 

uses  which  they  made  of  tests  so  the  figures  will  not  be  exactly  similar  to  those 
of  Tables  IV  and  V.  The  main  difference  occurs  in  achievement  tests  and  this 
can  be  attributed  to  the  fact  that  the  achievement  test  results  are  not  used  by  all 
the  principals  in  the  schools  in  which  they  are  given. 

The  different  types  of  standardized  tests  which  are  most 
used  in  the  secondary  schools  are  1.  intelligence  tests,  2.  ach- 
ievement tests,  3.  aptitude  or  prognostic  tests,  4.  rating  scales, 
5.  character  or  personality  scales,  and  6.  interest  and  attitude 
questionaires.  This  arrangement  gives  the  rank  order  of  the 
types  according  to  their  present  use.  The  per  cent  of  junior, 

1.  Pratt,  Helen  G.  “Proper  Use  of  Educational  Measurements— Considering 
their  Dangers  and  Difficulties.”  Journal  of  Edulational  Method,  7:204-208,  Feb- 
ruary, 1928. 


28 


TESTING  PROGRAMS 


senior  and  six  year  high  schools  making  some  use  of  each 
type(2)  is  given  in  Table  IV. 

The  program  of  most  secondary  schools  is  limited  to  the 
use  of  intelligence  and  standardized  achievement  tests,  with 
about  a fourth  of  them  using  aptitude  tests,  a much  smaller 
number  using  rating  scales  and  practically  none  of  them  using 
character  or  personality  tests.  A sound  comprehensive  pro- 
gram needs  all  of  these  types,  but  the  limitations  of  individual 
school  situations  which  include  lack  of  trained  personnel  or 
finances  have  to  be  taken  into  consideration. 

Intelligence  tests.  There  has  been  and  still  is  a great  deal 
of  controversy  over  what  intelligence  is  and  what  intelligence 
tests  measure.  To  consider  all  the  issues  involved  and  present 
all  view-points  is  impossible  in  this  limited  space.  There  are 
those  who  are  certain  that  the  present  intelligence  tests  do 
furnish  a reliable  measure  of  intelligence,  those  who  are  not  so 
certain  what  the  tests  measure  but  feel  that  they  are  useful 
in  school  work,  and  these  who  are  sure  that  the  tests  do  not 
measure  intelligence  and  that  they  should  not  be  used  in  the 
school.  Buckingham  in  considering  the  question  from  the 
standpoint  of  the  schools  concludes: 

“It,  therefore  appears  that  our  best  answer  to  the  question,  “What  is  intelli- 
gence?” is — so  far  as  school  matters  are  concerned — the  ability  to  learn. 

“Considered  in  this  light  the  measurement  of  intelligence  is  indeed  funda- 
mental for  school  purposes(3).” 

There  are  two  types  of  intelligence  tests  which  can  be  given 
in  the  high  school,  individual  tests  and  group  tests.  Individual 
intelligence  tests  can  only  be  given  to  one  person  at  a time 
and  also  require  a trained  worker  to  administer  them.  A list 
of  such  tests  is  given  in  Chapter  II  (see  references  9 to  14). 
Group  intelligence  tests  can  be  given  to  practically  any  desired 
number  of  persons,  the  only  limitation  being  seating  space. 
Over  600  pupils  have  been  administered  such  tests  in  a high 
auditorium  at  one  time  by  using  teachers  as  monitors.  Usually 
such  tests  are  given  in  the  ordinary  classrooms  and  can  be 
easily  administered  by  teachers  who  are  willing  to  study 
somewhat  carefully  the  directions  given  in  the  test  manuals 

2.  Interests  and  attitude  questionaires  were  not  included  for  they  art 
largely  in  experimental  form  at  present. 

3.  Buckingham,  B.  R.  Research  for  Teachers.  New  York:  Silver,  Burdet 
and  Company,  1926,  p.  142. 


PLANNING  THE  PROGRAM 


29 


Lists  of  the  group  tests  can  be  obtained  by  consulting  the 
references  in  the  bibliography  at  the  end  of  the  chapter  or  any 
of  the  publishers  will  be  glad  to  send  lists  of  their  publica- 
tions. 

The  main  uses  which  are  made  of  intelligence  tests  by  sec- 
ondary school  administrators  are: 

1.  To  help  in  forming  ability  groups  within  a grade.  (277  schools)  (4). 

2.  To  determine  whether  a pupil'  is  working  up  to  capacity.  (272  schools). 

3.  To  aid  in  determining  which  pupils  are  capable  of  doing  exceptional  work. 
(256  schools). 

4.  To  aid  in  studying  and  advising  failing  pupils.  (245  schools). 

5.  To  furnish  information  concerning  the  probable  success  a pupil  will  have 
in  a certain  curriculum.  (219  schools). 

6.  To  furnish  an  estimate  of  a pupil’s  probable  success  in  college.  (209 
schools). 

Other  uses  are  given  later  in  discussing  uses  of  tests  in  classi- 
fication and  promotion,  guidance,  and  supervision.  Needless 
to  say,  any  instrument  which  will  assist  in  doing  these  things 
should  have  a place  in  the  high  school  program. 

Achievement  Tests.  The  outstanding  purpose  of  achieve- 
ment tests  is  to  improve  teachers’  judgments  of  the  pupils’ 
achievements.  Teachers  have  always  judged  the  value  of  the 
work  of  their  students  and  marked  them  in  some  manner  on 
the  basis  of  that  estimate.  Some  16,800.000  marks  are  given 
at  least  once  a year  by  teachers  throughout  the  country  and 
recorded  in  high  school  offices.  This  computation  is  based  on 
the  fact  that  there  are  over  4,200,000  pupils  in  the  secondary 
school  and  the  assumption  that  each  pupil  took  four  subjects. 
Any  instruments  which  can  help  make  these  marks  more 
closely  represent  the  actual  achievement  of  the  pupils  have  a 
place  in  the  high  school  program. 

There  are  many  studies (5)  which  show  that  marks  are 
not  reliable  and  do  not  represent  achievement.  For  instance, 

4.  The  number  of  schools  indicates  the  number  of  the  493  schools  which 
make  use  of  tests  for  the  given  purpose. 

5.  The  following  are  typical  of  such  studies: 

Hughes,  W.  H.  “Analyzing  the  Ingredients  of  Teachers’  Marks.”  Nation’s  Schools, 
6:21-25,  December,  1930. 

Kelly,  F.  J.  Teachers’  Marks,  Their  Variability  and  Standardization.  Teachers 
College,  Bureau  of  Publications,  Columbia  University,  Contributions  to  Educa- 
tion, No.  66,  1914,  83  pp. 

Lee,  Dorris  May  and  Lee,  J.  Murray  “Some  Relationships  between  Algebra 
and  Geometry.”  Journal  of  Educational  Psychology,  22:551-560,  October,  1931. 
Starch,  D.  and  Eliott,  E.  C.  “Reliability  of  Grading  High  School  Work  in  Mathe- 
matics.” School  Review,  21:254-259,  April,  1913. 


TESTING  PROGRAMS 


jd 

there  is  data  to  show  that  for  equal  achievement  in  algebra 
and  geometry  girls  received  higher  marks  than  boys(6). 

Standardized  achievement  tests  are  provided  in  practically 
ewery  subject  in  the  curriculum  as  Table  I shows.  The  princi- 
pal difficulty  lies  in  selecting  tests  which  measure  the  extent 
to  which  the  objectives  of  the  course  have  been  attained.  Stan- 
dardized tests  usually  aim  to  measure  the  body  of  subject 
matter  most  frequently  taught  throughout  the  country  and 
thus  occasionally  are  not  especially  suitable  for  individual 
courses. 

The  main  uses  of  standardized  achivement  tests  by  second- 
ary school  administrators  are: 

1.  To  compare  the  standing  of  the  school  with  the  test  norms.  (239  schools). 

2.  To  determine  whether  a pupil  is  working  up  to  capacity.  (210  schools). 

3.  To  measure  progress  during  the  semester  or  year.  (207  schools). 

4.  To  stimulate  interest  on  the  per l of  the  teachers  in  the  improvement  of 
instruction.  (192  schools). 

5.  To  aid  in  advising  and  studying  failing  pupils.  (180  schools). 

6.  To  help  in  forming  ability  groups  within  a grade.  (179  schools). 

Aptitude  or  prognosis  tests.  Aptitude  tests  provide  meas- 
urement of  probable  success  in  individual  subjects  or  subject 
fields.  At  present  there  are  aptitude  tests  available  for  use  in 
mechanical  ability,  foreign  language,  mathematics,  art,  music, 
and  commercial  subjects.  However,  the  number  of  such  tests 
on  the  market  is  quite  limited. 

Aptitude  tests  aim  to  predict  success,  and  the  efficiency  with 
which  they  do  predict  achievement  furnishes  the  means  of 
evaluating  them.  In  this  they  differ  from  achievement  tests, 
for  a teacher  can  study  an  achievement  test  to  discover  if  it 
measures  the  subject  matter  of  her  course,  but  no  one  can 
look  at  an  aptitude  test  and  tell  how  well  it  is  going  to  work. 
The  basis  for  judgment  of  the  validity  of  prognosis  tests  is  the 
statistical  results  reported  by  the  author  or  others  from  the 
actual  use  of  the  test. 

The  main  uses  made  of  prognosis  or  aptitude  tests  in  the 
schools  are: 

1.  To  furnish  information  concerning  the  pupil’s  probable  success  in  a cer- 
tain subject.  (60  schools). 

2.  To  furnish  information  concerning  the  pupil’s  probable  success  in  a cer- 
tain curriculum.  (59  schools). 

6.  An  unpublished  study  by  the  writer  based  on  800  cases  in  five  schools 
in  three  different  systems. 


PLANNING  the  program 


31 


3.  To  bring  about  a better  understanding  of  the  capabilities  of  the  pupils 
when  discussing  educational  or  vocational'  plans  of  the  pupils  with  the 

parents.  (48  schools). 

4.  To  aid  in  advising  and  studying  failing  pupils.  (38  schools). 

5.  To  aid  in  determining  which  pupils  are  capable  of  doing  exceptional 
work.  (32  schools).) 

6.  To  help  in  forming  ability  groups  within  a grade.  (29  schools). 

Another  use  that  has  been  suggested  and  which  is  most 
important  is  to  determine  “whether  low  accomplishment  is 
due  to  lack  of  ability  or  lack  of  application  (7).” 

Rating  Scales.  Rating  scales  provide  a means  of  increasing 
the  definiteness  of  teachers’  judgment  on  traits  which  cannot 
be  measured  by  objective  means,  such  as  industry,  accuracy, 
initiative,  reliability,  cooperation,  and  leadership.  Examples 
of  rating  scales  which  have  been  used  in  the  school  situation 
are  those  by  Hughes (8)  and  Haggerty,  Olson  and  Wick- 
man(9). 

The  opportunities  which  are  offered  in  the  use  of  rating 
scales  have  hardly  begun  to  be  realized  by  the  schools.  The 
most  extensive  work  in  a public  school  system  has  been  by 
W.  H.  Hughes  at  Pasadena,  California.  The  best  available 
discussion  of  rating  methods  and  rating  scales  is  given  by 
Symonds(lO)  and  a brief  summary  of  such  methods  as  they 
relate  to  guidance  has  been  prepared  by  Koos  and  Kefauv- 
er(ll). 

The  main  uses  of  rating  scales  reported  by  schools  are: 

1.  To  aid  in  determining  whether  a pupil  should  be  recommended  to  college 
or  university  (Administrative).  (18  schools). 

2.  To  help  in  forming  ability  groups  within  a grade.  (14  schools). 

3.  To  aid  in  advising  and  studying  failing  pupils.  (13  schools). 

4.,  To  bring  about  a better  understanding  of  the  pupils’  capabilities  when 
discussing  educational  plans  of  the  pupils  with  the  parents.  (12  schools). 

5.  To  stimulate  interest  on  the  part  of  teachers  in  the  improvement  of  in- 
struction. (12  schools). 

6.  To  furnish  an  estimate  of  the  pupils’  probable  success  in  college  (guid- 
ance). (11  schools). 

7.  Lee,  J.  Murray  and  Lee,  Dorris  May  “The  Construction  and  validation  of 
a Test  of  Geometric  Aptitude.”  Mathematics  Teacher,  25:202,  April,  1932. 

8.  Hughes,  W.  H.  “A  Rating  Scale  for  Individual  Capacities,  Attitudes  and 
Interests.”  The  Journal  of  Educational  Method,  3:56-65,  October,  1923. 

9.  Haggerty  - Olson -Wickman  Behavior  Rating  Scales.  Yonkers-on-Hudson, 
New  York:  World:  World  Book  Company,  1930. 

10.  Symonds,  Percival  M.  Diagnosing  Personality  and  Conduct.  New  York: 
The  Century  Company,  1931,  pp.  41-12L 

11.  Koos,  L.  V.  and  Kefauver,  G.  N.  Guidance  in  Secondary  Schools.  New 
York:  The  MacMillan  Company,  1932,  pp.350-363. 


TESTING  PROGRAMS 


22 


Character  and  personality  tests.  Tests  o£  character  and  per- 
sonality and  interest,  attitude,  and  adjustment  questionaires 
are  in  the  process  of  development  and  experimentation.  The 
technicians  have  perfected  very  few  instruments  in  the  field 
which  are  useful  in  the  school  except  where  there  is  a trained 
tester  ( 12)  with  a knowledge  of  the  possibilities  and  limitations 
of  such  instruments. 

The  most  complete  and  comprehensive  summary  of  the 
work  in  measuring  character  and  personality  has  been  made 
by  Symonds  in  his  recent  volume  Diagnosing  Personality  and 
Conduct  (13).  If  one  is  working  with  these  tests  he  should 
be  familiar  with  this  treatment  of  the  subject. 

The  uses  of  such  tests  reported  by  a very  few  schools  are: 

1.  To  bring  about  a better  understanding  of  the  pupils’  capabilities  when 
discussing  educational  and  vocational  plans  of  the  pupils  with  the  par- 
ents. (7  schools). 

2.  To  aid  in  studying  and  advising  failing  pupils.  (6  schools). 

3.  To  aid  in  determining  whether  pupils  should  be  recommended  to  college 
or  university.  (5  schools). 

2.  What  Types  of  Testing  Programs  Are  the  Schools 

Conducting? 

The  standardized  testing  program  of  secondary  schools  in- 
cludes largely  group  and  individual  intelligence  tests  and 
achievement  tests.  (14)  Using  these  three  types  of  tests,  there 
are  eight  combinations  that  a school  could  choose  from:  giving 
all  three  would  be  one,  giving  group  and  individual  intelli- 
gence tests  and  not  achievement  tests  would  be  a second,  and 
giving  none  at  all  would  be  a third.  Using  the  replies  received 
in  the  present  study,  the  per  cent  of  schools  following  each 
of  the  eight  practices  have  been  classified  in  Table  V accord- 
ing to  the  type  of  school  and  size  of  community.  The  data  of 
Table  V were  first  analyzed  according  to  size  of  school  as  well 
as  size  of  community,  but  no  consistent  increase  in  the  amount 
of  testing  done  occurred  with  an  increase  in  the  size  of  the 
school.  There  was  a difference  between  small  schools  with 
less  than  500  enrolment  and  larger  schools,  but  there  was  no 
consistent  difference  between  the  various  groupings  of  schools 
larger  than  500  pupils. 

12.  The  term,  a trained  tester,  is  not  used  here  to  refer  to  one  having  been 
trained  to  give  “Binets”  but  rather  to  one  trained  in  this  special  field. 

13.  Symonds,  Percival  M.  op.  cit. 

14.  Koos,  L.  V.  and  Kefauver,  G.  N.  op.  cit.  p.  282. 


PLANNING  THE  PBOGTtAM 


33 


The  outstanding  fact  shown  in  Table  V is  that  the  kinds  oJ 
tests  given  depend  largely  upon  the  size  of  the  community 
Studying  column  I the  per  cent  of  schools  giving  individual 
and  group  intelligence  tests  and  standardized  achievement 
tests  increases,  for  the  junior  high  schools,  from  33.3%  in  small 
towns  to  67 2%  in  large  cities,  with  similar  increases  noted  for 
the  senior  and  six  year  high  schools. 

TABLE  V 


The  Per  Cent  of  SchooLs  Following  Certain  Testing  Practices, 
Analyzed  According  to  Type  of  School  and  Size  of  Community 


Individual  Intelligence 
Group  Intelligence 
Standard  Achievement] 

*t  1 

* 1 
* | 

* 1 
* 1 
- I 

* 1 

* | 

* 

- 1 
- | 

♦ j 

- 1 
* I 

- | 

- 

_L_1 

1 

| 

No 

of 

Cases 

1 

2 

3 . 

4 

3 

6 

7 

8 

9 

10 

Junior  H.  S. 

under  25,000  (S)f 

33.3 

0.0 

0.0 

0.0 

33.3 

4.8 

28.6 

0.0 

21 

25,000-100,000  (A) 

42.8 

5.7 

2.9 

2.9 

22.8 

5.7 

14.3 

2.9 

35 

above  100,000  (L) 

67.2 

1.5  j 

4.5 

1.5 

14.9 

7.4 

3.0 

0.0 

67 

Senior  H.  S. 
under  25,000  (S) 

25.6 

2.6 

0.8 

0.0 

40.2 

6.0 

11.1 

13.7 

117 

25,000-100,000  (A) 

40.0 

4.3 

1.5 

0.0 

30.0 

10.0 

7.1 

7.1 

70 

above  100,000  (L) 

50.6 

11.2 

1.1 

1.1 

19.1 

3.4 

4.5 

9.0 

89 

Six  Year  H.  S. 
under  25,000  (S) 

42.2 

0.0 

0.0 

0.0 

42.2 

6.6 

4.5 

4.5 

45 

25,000-100,000  (A) 

71.4 

0.0 

0.0 

0.0 

21.4 

12 

0.0 

0.0 

14 

above  100,000  (L) 

80.0 

2.9 

0.0 

0.0 

17.1 

0.0 

0.0 

0.0 

35 

Total 

Junior  H.  S 

54.5 

2.4 

3.3 

1.6 

20.3 

6.5 

10.6 

0.8 

123 

Senior  H.  S 

37.3 

5.8 

1.1 

0.3 

30.8 

62 

8.0 

10.5 

276 

Six  Year  H.  S 

60.6 

1.1 

0.0 

0.0 

29.8 

4.3 

2.1 

2.1 

94 

Grand  Total 

46.0 

4.1 

1.4 

0.6 

28.0 

5.9 

7.5 

6.5 

493 

f * indicates  which  type  of  test  is  given  and  — • indicates  which  type  is  not 
given.  The  first  column  of  letters  gives  the  per  cent  which  give  all  three 
types  of  tests,  the  second  column,  **—  indicates  those  schools  which  give  both 
individual  and  group  intelligence  tests  but  do  not  give  standardized  achieve- 
ment tests.  The  column — — — gives  the  per  cents  for  those  schools  not  using 
any  of  the  three  types  of  tests.  The  last  column  gives  the  number  of  cases 
upon  which  the  per  cents  in  that  row  were  calculated. 

t S,  A and  L will  be  used  hereafter  to  refer  to  the  size  of  the  community 
as  small,  average  and  large. 

Most  of  the  schools  are  carrying  out  a testing  program  in 
some  form,  for  the  per  cents  in  column  9 are  very  small  and 
in  some  cases  zero.  The  group  doing  the  least  testing  is  the 
high  school  in  communities  less  than  25,000,  13. 7%  of  which 


34 


TESTING  PROGRAMS 


do  no  testing.  Using  the  returns  from  the  follow-up  and  limit- 
ing it  to  schools  in  such  towns  which  have  enrolments  of  less 
than  500,  28%  of  such  schools  do  no  standardized  testing.  (This 
is  not  given  in  tabular  form.) 

The  need  of  testing  in  small  schools  is  nearly  as  great  as  in 
large  schools.  Even  though  there  is  no  possibility  of  classifica- 
tion, intelligence  tests  are  useful  alike  in  small  and  large 
schools  in  guidance  and  as  a measure  of  ability  to  learn. 
Teachers  need  to  be  able  to  diagnose  learning  difficulties  and 
to  know  what  their  pupils  have  achieved  in  a small  school  as 
well  as  a large  one.  In  fact  the  use  of  achievement  test  results 
to  compare  pupils  with  the  norms  is  even  greater  in  the  small 
school,  for  here  they  do  not  have  the  large  number  of  pupils 
in  any  subject  within  their  own  school  on  which  comparisons 
of  achievement  can  be  made. 

The  lower  part  of  Table  V summarizes  the  per  cent  of 
junior,  senior  and  six  year  high  schools  with  each  type  of  pro- 
gram. Nearly  five-eighths  (60.6%)  of  the  six  year  high  schools 
are  using  all  three  kinds  of  tests  as  compared  with  slightly 
over  half  (54. 5%)  of  the  junior  highs  and  with  three-eighths 
of  the  senior  high  schools.  Over  ten  per  cent  of  the  senior  high 
schools  do  no  standardized  testing.  The  lesser  amount  of  test- 
ing on  the  senior  high  school  level  is  thus  clearly  marked. 
Certainly  measurement  with  its  values  for  guidance,  recogni- 
tion of  individual  differences,  classification,  supervision,  diag- 
nosis, and  marking  is  as  much  needed  here  as  in  the  junior 
or  six  year  high  school. 

To  facilitate  seeing  the  relative  emphasis  placed  on  each 
kind  of  tests  by  schools,  the  data  of  Table  V have  been  com- 
bined and  are  presented  in  Table  VI. 

Here  again  an  increase  in  the  per  cent  of  schools  which  give 
tests  is  noticed  as  the  size  of  the  community  increases.  This 
increase  is  especially  true  for  individual  and  group  intelli- 
gence tests,  though  not  so  marked  for  standardized  achieve- 
ment tests.  The  per  cent  of  schools,  junior  high,  senior  high, 
or  six  year,  giving  individual  intelligence  tests,  is  more  than 
twice  as  great  for  the  large  schools  as  for  the  small.  For  group 
intelligence  and  achievement  tests  this  discrepancy  is  greatly 
decreased. 

The  fact  that  the  senior  high  school  is  doing  the  least  testing 
is  apparent  when  the  totals  are  studied.  Over  one-fifth  (22. 8%) 


PLANNING  THE  PROGRAM 


35 


'of  the  senior  high  schools  do  not  use  achievement  tests  as  com- 
pared with  about  one-tenth  (11.3%)  of  the  junior  high  schools. 
The  indication  is  that  the  six  year  high  school  has  the  largest 
per  cent  of  schools  giving  tests.  One  is  inclined  to  wonder  if 
the  recency  of  the  organization  of  the  types  of  schools  has  any- 
thing to  do  with  their  acceptance  of  progressive  methods  and 
instruments. 

TABLE  VI 


The  Per  Cent  of  Schools  Giving  Various  Types  of  Tests 
(An  Adaptation  of  Table  V)* 

f 


Type  of  School 

Individual 

Intelligence 

Group 

Intelligence  j 

Standardized 

Achievement 

Junior  H.  S.  S** 

33.3 

71.4 

95.2 

a A 

54.3 

77.0 

82.8 

L 

74.7 

91.0 

89.6 

Senior  H.  S.  S 

29.0 

74.4 

77.7 

A 

45.8 

84.3 

78.6 

L 

64.0 

84.3 

75.3 

Six  Year  H.  S.  IS 

42.2 

91.0 

88.9 

A 

71.4 

100.0 

92.8 

L 

82.9 

100.0 

97.1 

Totals 

Junior 

61.8 

83.7 

88.7 

Senior 

44.5 

80.1 

77.2 

Six  Year 

61.7 

S5.8 

92.5 

Grand 

52.1 

84.0 

82.9 

*See  Table  V for  the  number  of  school's  in  each  row. 

**S — cities  under  25,000.  A — cities  between  25,000  and  100,000,  L — cities  above 
100,000 


Group  intelligence  and  standardized  achievement  tests  seem 
to  be  about  equally  popular,  slightly  over  four-fifths  of  the 
schools  making  use  of  both  types  of  tests. 

Summary.  Only  a relatively  small  per  cent  (6.5%)  of  the 
secondary  schools  are  not  making  use  of  standardized  meas- 
ures in  some  form.  Over  one-fifth  of  the  remainder  are  using 
either  intelligence  tests  and  standardized  achievement  tests 
alone,  which  means  that  those  schools  have  only  a partial 
testing  program.  Individual  intelligence  tests  are  not  used 
extensively  especially  in  small  communities.  This  is  not  sur- 
prising as  trained  people  are  required  and  the  tests  are  very 
time  consuming.  Over  a fourth  (28%)  of  the  small  high  schools 
in  communities  under  25,000  do  no  standardized  testing  of 


36 


TESTING  PROGRAMS 


any  kind.  This  fact  seems  to  show  a need  of  a planned  pro- 
gram of  testing  which  will  be  especially  suitable  for  this  type 
of  school. 

3.  Who  Should  Be  Responsible  for  the  Selection  of  Tests? 

The  selection  of  tests  is  becoming  an  increasingly  complex 
problem  as  the  number  of  available  tests  increases.  With  over 
610  tests  published  which  are  suitable  for  use  in  the  second- 
ary schools  one  would  have  to  be  an  expert  to  be  able  to  select 
the  best  tests  in  each  field.  The  difficulties  presented  lead  to 
the  general  principle  that  the  best  fitted  person  in  the  system 
should  select  the  tests.  This  person  in  large  cities  will  probably 
be  the  research  director.  Hildreth  says,  “The  supervision  of 
measurement  by  a central  bureau  insures  unity  in  the  testing 
program  and  facilitates  the  wide  spread  of  measurements  in 
the  large  city  centers”  (15).  In  a small  system  where  such  ex- 
pert advice  is  not  available  the  principal,  or  some  other  person 
on  the  staff  of  the  school,  should  have  sufficient  training  to 
either  select  or  approve  the  selection  of  intelligence  and 
achievement  tests. 

Intelligence  Tests.  Over  three-fourths  of  the  responsibility 
for  the  proper  selection  of  intelligence  tests  is  assumed  by 
four  school  officials,  the  research  department  (41%),  the  prin- 
cipal (22%),  the  superintendent  (8%),  and  the  counselor 
(5%).  Other  persons  or  combinations  of  persons  are  respons- 
ible in  the  remaining  schools. 

There  is  a marked  change  in  the  responsibility  which  the 
principal  exercises  as  the  size  of  the  communities  increases. 
In  small  towns  the  principal  usually  has  charge  of  the  selec- 
tion, while  in  large  cities  where  research  departments  exist, 
they  usually  exercise  this  function.  This  is  well  illustrated  in 
Table  VII  where  in  over  half  (53%)  of  the  schools  in  small 
towns,  the  principals  chose  the  intelligence  tests  while  in  the 
communities  of  average  size  and  in  large  cities,  this  per  cent 
decreases  to  32%  and  3%. 

These  data  seem  to  suggest  that  high  school  principals  in 
small  communities  should  have  sufficient  knowledge  to  enable 
them  to  select  suitable  intelligence  tests  for  use  in  their  schools 

15.  Hildreth,  Gertrude  H.  Psychological  Service  for  School  Problems.  Yonkers- 
on-Hudson,  New  York:  World  Book  Company,  1930.  p.  66. 


PLANNING  THL  PROGRAM 


07 


TABLE  VII 


Per  Cent  of  Schools  in  Which  Various  Persons  are  Responsible 
lor  the  Selection  of  Tests 


Junior 

Senior 

Six  Year 

Total 

Grand 

Persons  Selecting 

S 

A 

L 

s 

A 

L 

S 

A 

J 

S 

6 

Total 

intelligence  Tests: 

Research  

21 

55 

72 

6 

38 

64 

5 

38 

63 

61 

35 

33 

41 

Principal  

29 

9 

2 

53 

32 

3 

35 

0 

9 

7 

30 

20 

22 

Superintendent  

7 

0 

6 

16 

3 

1 

22 

8 

0 

5 

7 

11 

8 

Counselor  

0 

9 

0 

1 

3 

11 

0 

23 

19 

2 

5 

11 

5 

Other  Plan  

43 

27 

20 

24 

24 

21 

38 

31 

9 

25 

23 

25 

24 

Number  of  Schools 

14 

22 

64 

82 

60 

72 

37 

13 

32 

100  214 

CM 

00 

396 

Achievement  tests: 

Research  

6 

36 

48 

3 

9 

34 

2 

30 

40 

37 

14 

20 

21 

Department  Head.... 

0 

0 

12 

5 

31 

24 

0 

20 

10 

1 7 

18 

6 

13 

Teacher  & Prin 

41 

4 

3 

27 

10 

2 

18 

10 

3 

10 

15 

11 

13 

Teachers  1 

6 

4 

G | 

20 

9 

5 

12 

0 

7 

2 

12 

9 

9 

Principal  

6 

4 

3 

16 

5 

3 

15 

0 

0 

4 

9 

8 

7 

Sup’t  or  Ass’t  

12 

8 

10 

8 

2 

2 

20 

0 

3 

10 

4 

11 

7 

Other  Plan  

29 

44 

24 

21 

34 

30 

33 

40 

37 

30 

28 

35 

30 

Number  of  Schools 

17 

25 

5J 

90 

58 

62 

40 

10 

30 

101  210 

80 

391 

or  test  service  should  be  provided,  say  by  the  state  depart- 
ment. That  such  is  not  the  case  now  is  brought  out  in  a later 
section  (Chapter  IV,  Section  6.) 

Other  plans  for  the  selection  of  these  tests,  which  consti- 
tute 24%  of  the  total,  are  extremely  varied.  The  most  popular 
plans  among  these  consist  of  some  combination,  usually  invol- 
ving the  principal,  such  as  the  principal  with  the  research 
department  or  counselor,  or  teacher,  or  as  is  sometimes  the 
practice,  the  state  department  of  education.  Some  schools 
depend  on  a committee  of  teachers,  usually  called  a test  com- 
mittee, others  on  one  teacher  who  is  a specialist  in  testing, 
and  in  one  case  this  function  is  even  delegated  to  a clerk. 
Agencies  outside  the  immediate  school  system  funtioning  in 
this  capacity  are  state  departments  of  education,  county  boards 
of  education  and  universities. 

Achievement  Tests.  The  responsibility  for  the  selection  of 
achievement  tests  is  placed  on  an  even  greater  variety  of  per- 
sons than  that  for  the  intelligence  tests.  This  is  partly  due  to 
the  fact  that  there  is  a great  difference  in  practice  in  the  vari- 
ous sizes  of  the  communities.  If  high  schools  in  small  com- 
munities were  examined  one  would  say  that  achievement  tests 


38 


TESTING  PROGRAMS 


are  selected  by  either  the  principal  or  teacher  or  the  two  work- 
ing together.  However,  if  high  schools  in  large  cities  were 
studied  alone  one  would  say  that  tests  are  selected  by  the 
research  department  or  department  head,  and  that  teachrs 
or  principals  have  relatively  little  to  do  with  making  the 
selection. 

Table  VII  shows  significant  facts  concerning  the  selection 
of  achievement  tests.  First,  the  responsibility  of  the  research 
department  increases  with  an  increase  in  community  size. 
Second,  the  function  of  the  principal  in  exercising  sole  con- 
trol or  in  approving  the  choice  of  the  teacher  decreases  as  the 
size  of  the  community  increase.  Third,  there  is  a need  for 
principals,  especially  in  small  towns,  to  know  enough  about 
achievement  tests  in  all  subject  matter  fields  either  to  approve 
the  choice  of  his  teachers,  or  make  the  selection  himself.  This 
is  important  for  in  the  small  communities  the  principals  have 
something  to  do  with  the  selection  of  tests  in  at  least  47%  of 
the  junior  high  schools,  43%  of  the  senior  high  schools,  and 
33%  of  the  six  year  schools. 

Supervision  is  exercised  more  in  the  larger  communities. 
This  is  brought  out  clearly  by  the  fact  that  58%  of  high  schools 
in  the  large  cities  have  their  achievement  tests  selected  by 
either  the  research  department  or  the  department  head  as 
compared  to  the  8%  in  the  small  towns. 

The  problem  of  selection  of  proper  achievement  tests  be- 
comes increasingly  complex  with  the  number  of  teachers 
served.  Obviously  one  program  of  achievement  testing  cannot 
be  planned  in  a large  city  so  it  is  suitable  for  all  secondary 
school  classrooms  and  furnishes  the  maximum  of  benefit  to 
each  individual  child.  To  meet  this  need  about  third  of  the 
cities  over  100,000  publish  approved  lists  from  which  indi- 
vidual schools  or  teachers  can  select  achievement  tests  for 
their  use.  This  plan  makes  possible  critical  judgment  on  the 
worth-whileness  of  the  test  by  the  research  department  before 
it  is  placed  on  the  list  and  has  the  advantage  of  allowing  the 
teacher  or  department  head  freedom  in  the  selection  of  tests 
suitable  for  their  needs.  Incidentally,  it  is  an  economical  plan, 
for  a department  can  buy  in  quantities  and  need  only  carry  in 
stock  a limited  variety  of  tests. 


PLANNING  THE  PROGRAM 


39 


4.  Summary 

Junior  High  Schools.  The  majority  of  the  junior  high 
schools  give  all  three  types  of  tests,  individual,  and  group 
intelligence  and  standardized  achievement  tests.  Practically 
all  do  some  standardized  testing.  The  intelligence  tests  are 
usually  selected  by  the  research  departments.  Achievement 
tests  tend  to  be  selected  by  the  research  departments,  teacher 
and  principal,  and  superintendent  or  his  assistant. 

Senior  High  Schools.  The  senior  high  schools  are  about 
evenly  divided  between  those  giving  all  three  types  of  tests 
and  those  giving  only  group  intelligence  and  achievement 
tests.  Slightly  more  than  one  school  out  of  ten  do  standard- 
ized testing.  The  intelligence  tests  are  selected  by  the  depart- 
ment head,  teacher  and  prinicipal,  research  department,  or 
teachers,  each  in  about  the  same  number  of  schools. 

Six  Year  High  Schools.  More  of  these  secondary  schools 
give  all  three  types  of  tests  than  do  either  the  junior  or  senior 
high  schools.  Only  two  schools  out  of  ninety-four  reported 
that  they  did  no  standardized  testing.  Intelligence  tests  are 
selected  in  most  of  the  schools  by  either  the  research  depart- 
ment or  the  principal.  The  research  department,  teachers  and 
principal,  and  superintendent  or  assistant  select  the  achieve- 
ment tests. 

SELECTED  BIBLIOGRAPHY  OF  HIGH  SCHOOL  TESTS 

The  references  in  the  following  bibliography  contain  lists  of  tests  suitable 
for  use  in  the  secondary  schools.  The  principal  publishers  of  standardized  tests 
are  also  listed. 

Freeman,  Frank  N.  Mental  Tests.  Boston:  Houghton  Mifflin  Company,  1926,  pp. 
181-186. 

Lists  the  intelligence  tests  available  with  rather  complete  information  on 
each  one. 

Hildreth,  Gertrude  H.  Psychological  Service  for  School  Problems.  Yonkers-on- 
Hudson,  New  York:  World  Book  Company,  1 T30  pp.  281-310. 

Lists  representative  intelligence  and  achievement  tests  for  elementary  and 
junior  high  school. 

Hildreth,  Gertrude  H.  Bibliography  of  Mental  Tests  and  Rating  Scales.  New 
York:  Psychological  Corporation,  1933. 

Lists  over  3,000  different  tests. 

Kelley,  Truman  L.  Interpretation  of  Educational  Measurement.  Yonkers  on- 
Hudson,  New  York:  World  Book  Company,  1927,  pp.  220-348. 

Contains  the  then  valuable  intelligence  and  achievement  tests  rated  for 
their  general  excellence  for  individual  measurement. 


46 


TESTING  PROGRAMS' 


Kinder,  J.  S.  and  Odell,  Charles  W.  “Educational  Tests  for  Use  fix  Institutions 
of  Higher  Learning.”  University  of  Illinois  Bulletin,  Vol.  27,  No.  49,  Educa- 
tional Research  Circular  No.  55,  Urhana-  University  of  Illinois,  August  5,  1930,, 
95  pp. 

Lists  some  tests  suitable  in  upper  grades  of  high  school 

Monroe,  Walter  S.  “Standardized  Tests  for  the  High  School.”  Journal  of  Edu- 
cational Research.  1:151-153,  229-242,.  311-320,  February,  March,  April,  1920. 

Of  interest  because  it  is  the  first  published  bibliogaphy  of  high  school! 

tests. 

Odell,  Charles  W.  “Educational  Tests  for  Use  in  High.  Schools,  Third  Revision.’” 
University  of  Illinois  Bulletin,  Vol.  27,  No.  3,  Educational  Research.  Circular- 
No.  53,  Urbana:  University  of  Illinois,  September  17,  1929,  59  pp. 

Lists  high  school  tests  giving  forms,  purpose,  time,  publisher,  and  price 

Odell,  Charles  W.  Educational  Measurement  ire  High  SchooL  New  York:  The 
Century  Company,  I93t),  pp.  90-441. 

Describes  in  detail  the  tests  available  in  193®. 

Ruch,  G.  M.  and  Stoddard,  George  D.  Tests  and  Measurements'  in  High  School! 
Instruction.  Youkers-on- Hudson,  New  York:  World!  Book  Company,  1927, 
pp.  71-247. 

Discusses  rather  critically  the  available  tests. 

Smith,  Henry  Lester  and  Wright,  W.  W.  Second  Revision  of  the  Bibliography  of 
Educational  Measurement.  Bloomington,  Indianna:  Bulletin  of  the  School  of 
Education,  Indiana  University,  Vol.  4,  No.  2,  1927,  251  pp. 

Complete  list  of  the  tests  available  in  1927. 

Symonds,  Percival  M.  Measurement  in  Secondary  Education.  New  York1:  The 
Macmillan  Company,  1927,  pp.  53-230. 

Discusses  critically  the  high  school  tests. 

Tiegs,  Ernest  W.  Tests  and  Measurements  for  Teachers.  Boston:  Houghton 
Mifflin  Company,  1931,  pp.  315-317,  358-451. 

Briefly  describes  some  high  schooi  tests. 

Woody,  Clifford  “Standardized  Tests  Designed  for  Use  in  Institutions  of  Higher 
Learning.”  Eighteenth  Yearbook  of  the  National  Society  of  College  Teachers  of 
Education.  Chicago:  University  of  Chicago  Press,  1930,  pp.  18-72. 

Lists  some  tests  which  are  usable  on  high  school  level. 

PRINCIPAL  PUBLISHERS  OF  SECONDARY  SCHOOL  TESTS 

Bureau  of  Educational  Measurement  and  Standards,  Kansas  State  Teachers 
College,  Emporia,  Kansas. 

Bureau  of  Educational  Research  and  Service,  University  of  Iowa,  Iowa  City, 
Iowa. 

Educational  Test  Bureau,  Minneapolis,  Minn. 

C.  A.  Gregory,  Cincinnati,  Ohio. 

Ginn  and  Company,  Boston,  Mass. 

Gregg  Publishing  Company,  New  York  City 

Harlow  Publishing  Company,  Oklahoma  City,  Oklahoma 

Houghton  Mifflin  Company,  Boston,  Mass 

Public  Schooi  Publishing  Company,  Bloomington,  Illinois 

Smith,  Hammond  Company,  Atlanta  Ga.  I 


"PLANNING  THE  PROGRAM 


•'41 


Southern  California  School  Book  Depository,  Los  Angeles,  Calif. 

Southwestern  Publishing  Company,  Chicago,  Illinois. 

Stanford  University  'Press,  Stamford  University,  Palo  Alto,  Calif. 

Teachers  College,  Bureau  of  Publications,  Columbia  University-.  New  York  City 

'World  Book  Company,  Y onkers-on-Hudscn,  New  York 

SELECTED  REFERENCES 

Department  of  Superintendence  "Character  Education!’  Tenth  Yearbook,  W ash  - 
ington,  D.  C.:  National  Educational  Association,  1932,  Chapter  XVI. 

A brief  discussion  on  uses  of  character  teste  and  also  a most  complete 
bibliography  of  sucfi  tests. 

'Hildreth,  Gertrude  H.  Psychological  Service  for  School  Problems.  Yonkers-on- 
Hudson,  New  York:  World  Book  Company,  1930,  Chapter  IV. 

Provides  a brief  comprehensive  treatment  of  the  administration  of  testing. 

Mort,  Paul  P.  The  Individual  Pupil.  New  York:  American  Book  Company,  1928. 
Chapter  Vlfl. 

Contains  a description  of  certain  administrative  aids  in  discovering  the 
needs  of  pupils. 

Odell,  C.  W.  Educational  Measurement  in  High  School.  New  York:  The  Cefttuury 
Company,  1930,  Chapter  HI. 

Presents  criteria  for  the  selection  of  tests, 

Pressey,  Sidney  L.  and  Pressey,  Luella  Cole  Introduction  to  the  Use  of  Stand- 
ard Tests:  Y onkers  - on- Hudson , New  York:  World  Book  Company,  Revised 
Edition,  1931,  Chapters  XIV- XV. 

Discusses  the  planning  of  the  testing  program. 

Ruch,  G.  M.  and  Stoddard,  G.  D.  Tests  and  Measurements  in  High  School  In* 
struction.  Yonkers-on-Hudson,  New  York:  World  Book  Company,  1927, 
Chapter  IV. 

Outlines  criteria  for  the  selection  of  educational  tests. 

Symonds,  Percival  M.  Measurement  in  Secondary  Education.  New  York;  The 
Macmillan  Company,  1927,  Chapter  XIV. 

Discusses  need  and  methods  of  improving  measurement  in  the  high  school, 

Symonds,  Percival  M.  Diagnosing  Personality  and  Conduct.  New  York:  The 
Century  Company,  1931,  602  pp. 

The  most  comprehensive  discussion  of  measures  of  personality  and  conduct 
which  has  appeared  to  date.  Chapter  III  deals  with  rating  methods  while 
Chapter  Vni  deal’s  with  tests  of  conduct,  knowledge  and  judgment. 

Symonds,  Percival  M.  “The  Testing  Program  for  the  High  School.”  School  Re- 
view7, 40:97-108,  February,  1932. 

Suggests  desirable  testing  programs  for  the  high  school. 

Tiegs,  Ernest  W.  Tests  and  Measurements  for  Teachers.  Boston:  Houghton  Mif- 
flin Company,  1931,  Chapter  XVI. 

An  excellent  discussion  of  criteria  for  the  selection  of  standardized  tests. 

Wood,  Ben  D.  Measurement  in  Higher  Education.  Yonkers-on-Hudson,  New 
York:  World  Book  Company,  1923,  Chapter  VII. 

An  outline  of  some  principles  of  measurement. 


CHAPTER  IV 

ADMINISTRATION  OF  INTELLIGENCE  TESTS 

Introduction 

It  is  the  purpose  of  this  chapter  to  present  whatever  results 
of  experimental  evidence  are  available  on  questions  dealing 
with  the  administration  of  intelligence  tests.  The  present  prac- 
tice of  the  schools  is  studied  in  relation  to  the  scientific  find- 
ings. Changes  in  practice  are  suggested  wherever  necessary 
to  make  it  conform  with  what  evidence  indicates  is  desirable. 

There  are  many  questions  which  arise  in  connection  with 
the  administration  of  intelligence  tests,  some  of  which  have 
been  answered  by  careful  studies;  others  can  only  be  answered 
at  present  by  taking  the  judgment  of  those  who  are  thinking 
and  working  in  this  field.  Such  questions  as  the  following  will 
be  discussed.  The  number  corresponds  to  the  section  in  which 
they  are  treated. 

1.  What  per  cent  of  pupils  should  be  given  intelligence  tests? 

2.  How  often  should  intelligence  tests  be  given? 

3.  In  what  grades  should  intelligence  tests  be  given? 

4.  At  what  time  during  the  year  should  intelligence  tests  be  given? 

5.  Who  shall'  be  given  intelligence  tests? 

6.  What  intelligence  tests  should  be  used? 

7.  How  comparable  are  results  of  different  tests? 

8.  Who  should  give  intelligence  tests? 

9.  Who  should  score  the  tests? 

10.  How  should  I.  Q.s  be  figured? 

12.  How  should  intelligence  test  results  be  recorded? 

13.  To  whom  should  intelligence  test  results  be  made  available? 

14.  What  uses  are  made  of  intelligence  tests? 

1.  What  Per  Cent  of  Pupils  Should  Be  Given  Intelligence 

Tests? 

This  question  can  only  be  answered  after  a school  has  de- 
cided for  what  purpose  intelligence  tests  are  being  given. 
If  the  purposes  and  uses  as  set  forth  in  Chapters  I and  III  are 
accepted  a definite  answer  can  be  provided  for  the  question. 


INTELLIGENCE  TESTING 


43 


In  discussing  the  relation  of  the  mental  test  program  to  guid- 
ance, Koos  and  Kefauver  answer  the  question  with  the  state- 
ment, “An  adequate  program  will  certainly  require  a mental 
test  rating  for  every  student”  (1). 

The  mental  test  rating  can  be  obtained  from  either  a group 
or  an  individual  intelligence  test.  In  most  cases  results  of 
group  tests  are  adequate  but  there  are  some  cases  in  which 
individual  tests  are  advisable. 

Wtih  these  limitations  in  mind  the  secondary  schools  should 
place  most  of  their  emphasis  on  the  group  tests.  That  this  is 
now  being  done  can  be  seen  by  referring  to  Table  VIII  which 
gives  the  median  percentage  of  pupils  upon  which  group 
intelligence  test  results  are  available  (upper  part  of  table). 

TABLE  VIII 

The  Percentage  of  Pupils  upon  which  Group  and  Individual 
Intelligence  Test  Results  are  Available 


i 

Junior 

Senior 

Six 

Year 

Total 

Grand 

S 

A 

L 

S 

A 

L 

S 

A 

L 

J 

S 

6 

Total 

Group  Intelligence 

Median  Per  Cent.... 

93 

99 

100 

95 

97 

100 

99 

100 

100 

100 

97 

100 

97 

Per  Cent  of  Schools 

Reporting  100% 

39 

49 

56 

41 

42 

50 

48 

71 

50 

51 

44 

52 

47 

Number  of  Schools  .... 

18 

35 

64 

112 

69 

82 

42 

14 

34 

117 

263 

90 

470 

Individual  Intell. 
Median  Per  Cent*. 

0 

0 

1.7 

0 

0 

1.2 

0 

2.5 

3.0 

0.8 

0 

1.5 

0 

Per  Cent  of  Schools 
Reporting  5% 
or  more  

11 

22 

21 

11 

16 

19 

25 

31 

35 

20 

15 

30 

19 

Number  of  Schools 

18 

32 

56 

108 

67 

80 

44 

13 

31 

106 

255 

88 

449 

*Median  per  cent  refers  to  the  median  per  cent  of  pupils  on  which  results 
of  intelligence  tests  are  available. 


It  can  be  seen  that  only  47%  of  the  schools  are  actually  meet- 
ing the  need  of  a “mental  test  rating  for  every  student.”  The 
extent  to  which  schools  report  results  available  on  all  their 
pupils  varies  for  the  type  of  school  and  size  of  community. 
The  schools  in  smaller  towns  report  100%  less  often  than  the 
schools  in  large  cities.  Not  quite  as  large  a per  cent  of  the 
senior  high  schools  (44%)  come  up  to  the  standard  as  do  the 
junior  or  six  year  high  schools  (51%  and  52%).  It  is  noted  that 

1.  Koos,  L.  V.  and  Kefauver,  G.  N.  Guidance  in  Secondary  Schools.  New 
York:  The  Macmillan  Company,  1932,  p.  308. 


44 


TESTING  PROGRAMS 


84%  of  the  schools  are  giving  group  intelligence  tests  (Table 
VI)  and  only  47%  reported  data  available  on  all  pupils.  These 
facts  mean  that  44%  of  the  schools  which  are  giving  intelligence 
tests  are  not  providing  an  adequate  mental  testing  program. 

Individual  intelligence  test  results  are  available  on  a rather 
small  proportion  of  the  pupils  in  the  secondary  schools.  At 
least  half  the  schools  have  no  such  results  available  and  only 
19%  of  the  schools  have  Binet  I.  Q.s  on  5%  or  more  of  their 
pupils.  Another  way  of  stating  this  last  fact  is  that  less  than 
one  school  in  five  will  have  Binet  I.  Q.s  on  at  least  one  out 
of  every  twenty  of  their  students. 

Here,  as  in  the  case  of  the  group  tests,  the  per  cent  of  test 
results  differs  rather  markedly  in  different  sized  communi- 
ties, the  larger  cities  giving  relatively  more  tests.  There  are 
also  noticeable  differences  between  practice  in  the  different 
types  of  schools;  the  senior  high  school  shows  the  smallest 
percentages  of  results  available. 

Summary.  An  adequate  intelligence  testing  program  re- 
quires a minimum  of  one  mental  test  for  each  child  in  the 
school.  This  mental  test  rating  can  be  obtained  from  either  a 
group  or  an  individual  intelligence  test.  About  half  of  the 
junior  and  six  year  schools  and  somewhat  less  than  half  of 
the  senior  high  schools  are  meeting  this  requirement.  Indi- 
vidual intelligence  tests  are  given  to  a very  small  percentage 
of  the  pupils  in  about  half  of  the  schools.  It  appears  that 
nearly  three-eighths  of  the  schools  which  are  giving  intelli- 
gence tests  are  not  providing  an  adequate  mental  testing  pro- 
gram. 

2.  How  Often  Should  Intelligence  Tests  Be  Given? 

There  is  no  definite  answer  to  the  question  of  how  often 
intelligence  tests  should  be  given,  but  there  are  several  factors 
determining  a tentative  answer. 

First,  the  I.  Q.  obtained  from  the  Stanford  Binet  Test  is 
relatively  constant.  Summaries  of  studies  in  this  field  have 
been  made  by  Foran(2).  He  indicates  that: 

“The  more  important  causes  of  changes  in  the  I.  Q.  include  errors  in  the  use 
of  the  tests,  language  handicaps,  and  emotional  instability  that  is  either  a 
condition  or  a symptom — but,  even  as  it  is  now  and  excluding  causes  that 

2.  Foran,  T.  G.  “A  Supplementary  Review  of  the  Constancy  of  the  Intelli- 
gence Quotient.”  Educational  Research  Bulletin,  Catholic  University,  VoT.  4, 
No.  9,  November,  1929,  22  pp. 


INTELLIGENCE  TESTING 


45 


may  be  guarded  against,  the  quotient  remains  constant  in  spite  of  modifications 
of  the  environment,  training,  interval  between  tests,  and  other  factors  that 
have  been  discussed  (3). 

The  constancy  of  the  Binet  I.  Q.  would  indicate  that  it  is 
only  necessary  to  give  one  individual  test  to  a child  and  that 
will  be  sufficient  in  practically  all  cases.  This  conclusion  does 
not  hold  for  group  tests  because  it  has  been  shown  that  the 
results  of  the  different  group  tests  do  not  furnish  compar- 
able 1.  Q.s  (Section  7),  the  material  or  possible  test  items 
sampled  vary  from  test  to  test,  and  extraneous  distracting 
factors  disturb  the  results.  These  causes  for  variations  suggest 
the  need  of  giving  group  tests  frequently. 

Second,  the  predictive  value  of  group  intelligence  tests 
seems  to  decrease  as  the  time  interval  increases.  Data  sub- 
stantiating this  statement  are  contained  in  a study  by  Gates 
and  LaSalle(4)  on  the  elementary  level.  The  National  Intelli- 
gence Scale  and  the  Stanford  Revision  of  the  Binet  were  given 
and  the  achievement  tests  were  administered  at  0,  4,  8,  12,  16 
and  20  month  intervals.  Their  findings  as  given  in  Table  IX 

TABLE  IX 

Correlations  of  a Group  and  Individual  Intelligence  Test 
with  Achievement  Tests  at  Different  Timed  Intervals* 


Tests  Number  of  Months  Intervening 


0 

4 

8 

12 

16 

20 

Group  (National) 

..1  .87 

.83 

.84 

.85 

.79 

.76 

Individual'  (Binet) 

■ | .69 

.72 

.73 

.72 

.70 

.71 

* Adapted  from  data  included  in  Gates  and  La  Salle  Ibid  pp.  519-520. 


indicate  that  correlations  between  the  National  Intelligence 
Test  and  achievement  tests  decrease  with  an  increase  in 
time  intervals  but  that  those  with  the  Binet  do  not  decrease 
but  remain  fairly  constant.  It  should  be  mentioned  here  that 
even  with  the  decrease,  the  group  intelligence  test  which  was 
used  in  this  experiment  predicted  achievement  better  twenty 
months  later  than  did  the  individual  test  at  any  time. 

Though  this  study  was  made  in  the  elementary  school  it 

3.  Ibid.,  p.  36. 

4.  Gates,  A.  I.  and  La  Salle,  J.  “The  Relative  Predictive  Value  of  Certain 
Intelligence  and  Educational  Tests  Together  with  a Study  of  Effect  of  Edu- 
cational Achievement  upon  Intelligence  Test  Scores.”  Journal  of  Educational 
Psychology,  14:517-539,  December,  1923. 


46 


TESTING  PROGRAMS 


would  seem  possible  to  generalize  the  findings  to  the  secon- 
dary level,  at  least  until  similar  studies  are  conducted  in  the 
high  school.  For  the  high  school,  the  data  indicate  that  there 
is  need  for  administering  group  tests  at  intervals,  certainly 
not  more  often  than  once  a year  and  probably  every  two  or 
three  years  would  be  found  to  be  effective. 

Third,  practice  of  school  systems  and  opinions  of  workers 
in  the  field  indicate  the  advisability  of  giving  group  intelli- 
gence tests  at  frequently  spaced  intervals.  Hildreth (5)  recom- 
mends testing  mental  ability  at  crucial  points,  say  at  the  en- 
trance to  the  first,  fourth,  seventh,  and  ninth  grades.  Muncie, 
Indiana  gives  mental  tests  at  the  beginning  of  the  first  grade 
and  in  the  last  semester  of  the  third,  sixth,  eighth,  and  ninth. 
Philadelphia  provides  for  city  wide  mental  surveys  at  the 
end  of  the  sixth  and  eighth  grades. 

These  few  references  to  practice  are  sufficient  to  indicate 
that  a number  of  school  systems  have  a definite  plan  for 
administering  tests  in  certain  grades  spaced  at  intervals  of 
two  to  three  years.  This  practice  is  in  harmony  with  the 
investigations  of  this  problem.  It  also  lends  itself  to  a system- 
atic method  of  accumulating  data. 

3.  In  What  Grades  Should  Intelligence  Tests  Be  Given? 

The  previous  section  has  indicated  that  intelligence  tests 
should  be  given  several  years  apart  at  crucial  points  but  did 
not  specify  the  grades  in  which  they  were  to  be  given.  The 
problem  now  largely  resolves  itself  into  — What  are  the  crucial 
points  for  testing  in  the  secondary  school?  There  seem  to  be 
two  places  at  which  these  points  are  located.  The  entrance 
into  a new  unit  is  one.  That  means  that  such  tests  should  be 
given  either  just  before  or  after  entrance  to  the  seventh  grade 
for  the  junior  and  six  year  schools,  the  ninth  grade  for  the 
four  year  high  school,  and  the  tenth  grade  for  the  senior  high 
school.  The  second  crucial  point  should  be  at  a place  where  the 
majority  of  the  pupils’  future  educational  plans  are  deter- 
mined. This  is  at  present  for  most  pupils  at  the  beginning  of 
the  ninth  grade(6). 

An  attempt  was  made  to  obtain  a picture  of  this  phase  of 

5.  Hildreth,  Gertrude  H.  Psychological  Service  for  School  Problems.  Yonk- 
ers-on-Hudson,  New  York:  World  Book  'Company,  1930,  p.  62. 

6.  This  point  has  been  determined  in  a doctor’s  dissertion  now  in  progress 
at  Teachers  College,  Columbia  University,  by  H.  H.  Mills. 


INTELLIGENCE  TESTING 


4 7 


the  testing  program  in  the  present  investigation  by  asking 
that  the  grades  be  checked  in  which  intelligence  tests  are 
given.  The  replies  included  nearly  all  the  possible  combina- 
tions. These  replies  were  classified  roughly  under  five  differ- 
ent plans.  The  following  in  Table  X are  the  procedures  with 
the  per  cent  of  schools  using  each. 

TABLE  X 

Per  Cent  of  Schools  Using  Various  Plans  for  Testing 


Plan—  Per  Cent 

1.  In  all  grades 35.1% 

2.  In  the  first  year  of  the  school 21.5% 

3.  In  the  first  year  and  one  other  grade 15.2% 

4.  Previous  to  entering  the  school , 13.1% 

5.  In  the  last  year  of  school _ 5.8% 

6.  Other  plans,  or  combinations  of  the  above 9.3% 

100.0% 


There  are  practically  no  differences  between  the  practices 
of  the  three  types  of  schools  for  the  first  and  third  plans. 
However,  the  second  plan  is  only  followed  by  about  a twelfth 
of  the  six  year  high  schools  as  compared  with  a fifth  of  the 
junior  high  schools  and  a fourth  of  the  senior  high  schools. 
There  is  also  variation  as  to  the  fourth  plan:  21.0%  of  the 
junior  high  schools  have  intelligence  tests  given  to  pupils 
previous  to  entrance  in  the  school  as  contrasted  with  12. 1% 
for  the  senior  and  6.0%  for  the  six  year  school.  The  senior 
high  school  is  more  apt  to  give  tests  in  the  last  year  than  are 
either  of  the  others,  7.3<r°  for  the  senior  in  comparison  with 
4.0  for  the  junior,  and  4.8%  for  the  six  year  high  school. 

There  seems  to  be  a need  in  a large  number  of  schools  for 
a more  definite  plan  of  testing.  This  conclusion  is  based  on  the 
variety  of  plans,  many  of  which  it  would  be  difficult  to  justify 
if  judged  from  the  viewpoint  of  maximum  value  to  the  school. 
It  seems  in  the  light  of  the  data  that  a desirable  minimum 
is  an  intelligence  test  just  before  or  after  entrance  to  either 
a junior,  senior,  four  year  or  six  year  high  school.  An  addi- 
tional testing  point  is  needed  for  six  year  schools,  at  the 
place  where  specialization  of  curriculum  commences,  prob- 
ably either  the  ninth  or  tenth  grades.  Provisions  should  also 
be  made  for  testing  all  entering  students. 


48' 


TESTING  PROGRAMS 


4.  At  What  Time  During  the  Year  Should  Intelligence 
Tests  Be  Given? 

The  time  during  the  year  at  which  tests  are  given  is  not 
important  except  as  it  affects  the  use  of  the  test  results.  If 
tests  are  to  be  used  during  one  year  some  schools  test  at  the 
end  of  the  previous  year  and  some  at  the  beginning  of  the 
year.  Symonds  in  discussing  this  problem  suggests: 

“Usually  it  is  best  to  give  intelligence  tests  during  the  first  or  second 
meeting  in  the  new  term. — This  is  generally  better  than  giving  the  tests  at  the 
end  of  the  previous  term,  because  so  many  changes  occur  in  the  student  body 
of  a school  between  sessons”(7). 

However,  from  the  administrator’s  point  of  view  Douglass (8) 
believes  it  is  desirable  to  have  each  pupil’s  program  ready  to 
hand  to  him  on  the  opening  day  of  school.  If  pupils  are  to  be 
grouped  homogenously  on  the  basis  of  the  intelligence  test 
results  it  seems  necessary  to  have  that  information  before 
school  opens;  in  practice  this  means  testing  at  the  end  of  the 
previous  term. 

Of  the  four  hundred  schools  reporting,  forty-two  per  cent 
give  intelligence  tests  at  the  beginning  of  the  semester  or 
year,  28%  during  the  semester,  ll'7c  at  the  end,  and  19%  fol- 
low some  combination  of  the  three.  There  are  no  observable 
differences  in  practice  between  the  various  types  of  schools 
or  sizes  of  communities. 

5.  Who  Should  Be  Given  Individual  Intelligence  Tests? 

Individual  intelligence  tests  are  given  to  pupils  deviating 
in  some  manner  from  the  normal.  There  is  agreement  that 
it  is  necessary  to  give  such  tests  to  only  a small  proportion  of 
the  pupils.  Only  slightly  over  one-half  of  the  schools  give  any 
individual  intelligence  tests.  This  means  that  the  few  cases  in 
each  school  who  need  individual  study  and  analysis  are  being 
neglected. 

The  number  of  individual  intelligence  tests  to  be  given 
must  be  determined  on  the  basis  of  the  limitations  of  each 
school  system.  Individual  testing  requires  a trained  examiner 
and  such  a person  can  hardly  give  more  than  five  tests  dur- 
ing the  school  day.  Another  difficulty  is  that  the  Stanford 

7.  Symonds,  Percival  M.  Measurement  in  Secondary  Education.  New  York): 
The  Macmillan  Company,  1927,  p.  490. 

8.  Douglass,  Harl  R.  Organization  and  Administration  of  Secondary  Schools. 

Boston:  Ginn  and  Company,  19o2,  p.  131. 


INTELLIGENCE  TESTING 


49 


Revision  of  the  Binet  Scales  is  not  suitable  for  bright  children 
much  beyond  twelve  or  thirteen  years  of  age  as  it  is  impossible 
to  obtain  a mental  age  over  eighteen  on  it.  This  means  that 
a twelve  year  old  child  could  not  get  an  I.  Q.  over  150,  that 
a fourteen  year  old  could  not  get  over  128,  and  a sixteen  year 
old  over  112.  A revision  cf  the  Stanford  Binet  is  now  in  pro- 
cess and  more  work  is  being  done  on  the  upper  levels  which 
will  undoubtedly  make  the  test  more  suitable  for  use  in  the 
high  school. 

Individual  intelligence  tests  were  reported  given  to  some 
thirty-three  different  kinds  of  problem  cases.  These  thirty- 
three  types  are  listed  in  Table  XI,  ranked  according  to  the 
frequency  of  mention  for  the  first  sixteen  types.  The  other 
types  were  mentioned  only  once.  There  is  doubtless  much 
overlap  in  this  list,  but  the  wording  is  largely  copied  from 
the  original  statements  and  it  was  felt  that  the  value  of  the 
list  lay  in  presenting  as  nearly  as  possible  the  ideas  of  those 
reporting. 

This  list  has  value  to  the  counselor  or  psychologist  in  school 
practice  and  should  be  especially  valuable  to  anyone  planning 
a course  for  administrators  in  the  diagnosis  and  treatment 
of  problem  cases.  In  considering  the  types  of  cases  tested, 
the  facts  should  be  kept  in  mind  that  the  I.  Q.  from  a Binet 
test  will  not  be  accurate  for  a very  bright  child  much  older 
than  twelve  or  thirteen.  This  means  that  for  the  types  listed 
as  six,  eleven,  and  1,  the  test  may  not  function  on  the  high 
school  level.  The  other  types  furnish  suggestions  for  use  and 
the  test  should  be  valid  unless  the  mental  age  of  the  pupil 
approaches  16  or  18. 

The  rather  long  list  of  special  cases  in  need  of  study  calls 
attention  to  the  problems  which  exist  in  the  high  school.  The 
individual  intelligence  test  does  not  furnish  data  on  all  phases 
of  these  problems  but  is  only  one  instrument  useful  in  this 
study.  Eikenberry,  in  discussing  the  training  of  the  high  school 
principal  mentions  this  problem,  stating: 

"Additional  study  of  adolescence  is  highly  desirable,  as  is  also  a study 
of  mental  hygiene. — The  mental  problems  of  pupils  are  probably  more  serious, 
considered  from  the  standpoint  of  happiness,  than  are  their  physical  and  intel- 
lectual' problems”  (9). 

9.  Eikenberry,  D.  H.  “The  Professional  Traininq  of  Secondary  School 
Principals.”  School  Review,  38:438-509,  September,  1930. 


50 


TESTING  PROGRAMS 


TABLE  XI 

Type  of  Cases  Given  Individual  Intelligence  Tests 

1.  ‘Failures 

2.  ‘Discipline  cases 

3.  Prpblem  gasps 

4.  ‘Entering  pupils 

5.  Cases  of  discrepancy** 

6.  Especially  brilliant  pupils 
7-  Especially  dull  pupils 

8.  Special  cases 

9.  ‘Pupils  in  certain  grades 

10.  Maladjusted  cases 

11.  ‘Pupils  in  their  last  year  in  school 

12.  Personality  difficulties 

13.  Clinic  Cases 

14.  Atypical  pupils*** 

15.  Social  misfits 

16.  Request  cases 

The  following  were  mentioned  only  once: 

a.  Cases  out  of  the  ordinary 

b.  Certain  cases  for  study  and  research 

c.  Where  more  knowledge  of  the  pupil  is  needed 

d.  Where  some  adjustment  is  necessary 

e.  Special  guidance  cases,  educational,  vocational  or  social 

f.  Adjustment  cases 

g.  Pupils  changing  course 

h.  Pupil's  failing  in  more  than  half  of  their  work 

i.  Slow  pupil's  not  failing 

j.  Where  child  is  placed  in  school  for  crippled 

k.  Where  child  is  being  considered  for  special  class 

l.  Especially  brilliant  pupils  being  considered  for  enriched  class 

m.  Mentally  ill  pupils 

n.  Delinquents 

o.  Case  studies 

p.  Abnormal  retardation 

q.  New  over-age  pupils 

‘These  responses  were  printed  on  the  check  list  which  probably  is  a factor 
in  their  placement  in  the  list,  though  it  probably  did  not  affect  the  first  two. 

“Cases  of  discrepancy  which  were  mentioned  were  between  the  results  of 
two  intelligence  tests,  between  I.  Q.  and  marks,  where  the  work  of  the  pupil 
was  below  the  teacher’s  estimate  of  the  pupil’s  ability. 

“‘Includes  also  “deviates  from  the  normal.” 

There  are  no  rule  of  thumb  methods  of  handling  problem 
cases.  Careful  training  is  required  in  this  field  before  one  can 
come  to  understand  the  various  types  of  cases  and  methods  of 
solution.  Material  which  should  be  helpful  to  the  prinicpal 
who  is  anxious  to  become  more  informed  in  this  field  can  be 


INTELLIGENCE  TESTING 


51 


rfound  in  the  works  of  Flemming(lO),  Hildreth(ll) , 
,Hirsch(12),  Koos  and  Kefauver(13),  Reavis(14),  and  Sayless 
and  ,Nudd(15).  For  more  advanced  reading  Morgan(16)  and 
3ymonds  ( 17 ) are  most  useful. 

6.  What  Intelligence  Tests  Should  Be  Used? 

The  large  number  of  available  intelligence  tests  make  the 
choosing  of  a test  a somewhat  difficult  task.  It  is  the  purpose 
of  this  section  to  state  a few  recognized  principles  which 
should  be  considered  in  such  selection,  and  to  give  the  tests 
which  are  most  frequently  used. 

First,  an  intelligence  test  should  be  valid.  Validity  in  the 
case  of  a group  intelligence  test  can  be  determined  in  any 
one  of  three  ways  or  combinations  thereof — 1.  A high  cor- 
relation with  the  Binet,  2.  A high  correlation  with  a com- 
posite made  up  of  other  available  intelligence  tests  in  the 
same  range,  and  3.  A reasonably  high  correlation  with  teach- 
ers’ judgments  of  intelligence.  This  type  of  information  would 
be  helpful  if  given  in  the  test  manual  but  it  seldom  is.  Though 
some  have  questioned  the  Binet  as  a valid  measure  it  would 
still  seem  to  be  usable  in  this  respect. 

Second,  an  intelligence  test  must  be  reliable.  Two  forms 
of  the  test  should  correlate  with  each  other  over  .90  and 
preferably  over  .95.  These  data  are  more  often  given  in  the 
manual  than  those  on  validity. 


10.  Flemming,  Cecile  W.  Pupil  Adjustment  in  the  Modem  School.  New  York 
Teachers  College  Bureau  of  Publications,  Columbia  University,  1931,  94  pp. 

11.  Hildreth,  Gertrude  A.  “A  Survey  of  Problem  Pupils.”  Journal  of  Edu- 
cational Research,  18:1-14,  June,  1928. 

Psychological  Service  for  School  Problems.  Yonkers-on- 

Hudson,  New  York:  World  Book  Company,  1S30,  Chapter  V,  “The  Exceptional 
Child”  and  Chapter  VI,  “Intensive  Study  of  Individual  Pupils.”  pp.  88-145. 

12.  Hirsch,  Everett  C.  “The  Case  Method  of  Dealing  with  Individual  Diffi- 
culties in  the  Secondary  School.”  School  Review,  38:525-531,  September,  1931. 

13.  Koos,  L.  V.  and  Kefauver,  G.  N.  Guidance  in  Secondary  Schools.  New 
York:  The  Macmillan  Company,  1932,  Chapter  13,  “Preliminary  Considerations 
to  Counseling,”  and  Chapter  14,  “Counseling  the  Individual.”  pp.  403-505. 

14.  Reavis,  W.  C.  Pupil  Adjustment  in  the  Junior  and  Senior  High  School 
Boston:  D.  C.  Heath  and  Company,  1926,  348  pp. 

15.  Sayi'ess,  Mary  B.  and  Nudd,  Howard  W.  The  Problem  Child  in  School. 
New  York:  The  Commonwealth  Fund,  Division  of  Publications,  1825,  238  pp. 

16.  Morgan,  J.  J.  B.  The  Psychology  of  the  Unadjusted  School  Child.  New 
York:  The  Macmillan  Company,  1924,  300  pp. 

17.  Symonds,  Percival'  M.  The  Nature  of  Conduct.  New  York:  The  Macmillan 
Company,  1928,  346  pp. 

Diagnosing  Personality  and  Conduct.  New  York:  The  Century 

Company,  1931,  602  pp. 

Mental  Hygiene  of  the  School  Child.  New  York:  Macmillan  Com- 
pany, 1934,  321  pp. 


52 


TESTING  PROGRAMS 


Third,  an  intelligence  test  should  be  chosen  which  will 
satisfactorily  measure  the  range  of  ability  it  is  desired  to 
measure.  This  statement  needs  more  elaboration  than  did 
the  two  previous  statements  as  they  are  commonly  mentioned 
in  most  works  in  measurements.  One  would  not  think  of  using 
the  Detroit  First  Grade  Intelligence  Test  to  measure  ninth 
graders.  Experience  with  a test  often  enables  one  to  judge 
whether  or  not  it  is  suitable,  but  that  is  a rather  costly  pro- 
cedure. 

There  are  three  ways  of  telling  for  what  grades  a test  is 
most  suitable.  First,  the  grade  range  is  given  for  most  tests 
either  on  the  test  cover,  in  the  manual  of  directions,  or  in  the 
catalog  of  the  company  publishing  the  test.  The  test  should 
not  be  used  outside  this  range  unless  it  is  certain  from  other 
information,  that  the  test  functions  outside  these  grades.  Most 
companies  and  authors  tend  to  be  optimistic  concerning  the 
range  of  usefulness  of  their  tests;  hence  the  grades  usually 
constitute  the  maximum  range. 

Second,  the  norms  in  the  manual  give  a clue  as  to  the 
range  of  usefulness  of  the  test.  One  useful  method  is  to  take 
the  highest  mental  age  obtainable  on  the  test  and  divide  that 
by  the  chronological  age  of  the  oldest  pupil  in  the  group,  or  if 
such  a pupil  is  over  16,  use  age  16.  This  gives  the  highest  I.  Q. 
that  the  oldest  person  in  the  group  could  get.  The  procedure 
is  reversed  for  the  other  end  of  the  scale.  The  lowest  mental 
age  available  is  divided  by  the  chronological  age  of  the  young- 
est person  in  the  group  could  get.  There  are  two  cautions 
that  need  to  be  observed  with  this  method,  one  being  that 
the  extremes  of  tests  are  usually  rather  poorly  standardized 
and  the  other  that  taking  the  oldest  in  the  class  to  obtain 
the  effectiveness  of  the  upper  ranges  of  the  test  and  the  young- 
est for  the  lower  range  of  the  test  gives  the  most  extreme 
situation  possible.  It  is  usually  the  youngest  who  is  the  bright- 
est and  the  oldest  who  is  the  dullest. 

An  illustration  of  this  precedure  for  determining  the  usable 
range  of  a test  can  be  obtained  by  working  with  the  norms 
on  the  Terman  Group  Test  of  Mental  Ability.  The  highest 
mental  age  given  is  19  years,  6 month  and  the  lowest  is  10 
years,  4 month  (18).  Assuming  that  in  a class  of  eighth 
graders  the  ages  range  from  12  years  to  16  years,  the  highest 
would  be  19-6-^16-0  or  122.  The  lowest  possible  I.  Q.  that  the 


INTELLIGENCE  TESTING 


53 


youngest  could  get  would  be  10-4-1-12-0  or  86.  Thus  the  mini- 
mum range  of  I.  Q.s  on  such  a group  would  be  from  86  to  122. 
The  maximum  range  would  be  from  19-6-1-12-0  or  162  to 
10-4-1-16-0  or  64.  This  means  that  the  lowest  I.  Q.,  which 
pupils  in  the  class  could  obtain  would  vary  between  64  and 
86,  depending  on  their  age  and  the  highest  I.  Q.  which  any- 
one in  this  class  could  get  would  vary  from  122  to  162. 

The  point  of  the  previous  discussion  has  been  to  stress  the 
need  of  studying  the  range  for  which  the  test  is  suitable.  It 
is  also  advisable  to  test  pupils  who  score  at  the  extremes 
with  a second  test;  this  test  should  be  suitable  for  a lower 
or  higher  grade,  depending  on  whether  the  pupil’s  first  score 
approached  the  upper  or  lower  limit  of  the  test. 

Another  method  of  determining  the  grades  for  which  a 
test  is  most  useful  is  to  find  the  age  norm  for  half  the  total 
score  of  the  test.  This  can  be  done  by  looking  up  the  norms. 
The  test  is  most  effective  as  a measuring  instrument  near 
this  age.  The  average  score  of  a group  should  be  about  one- 
half  of  the  total  possible  score  as  a test,  if  the  test  is  to  be 
effective.  This  statement  is  based  on  research  findings  of 
Symonds(19)  and  T.  G.  Thurstone(20)  dealing  with  the 
difficulty  of  tests  and  test  items.  Symonds  showed  mathemati- 
cally that  to  obtain  the  highest  validity  one  should  select 
“items  as  nearly  as  possible  of  the  same  difficulty,  which  can 
be  answered  by  the  average  pupil  with  fifty  per  cent  accur- 
acy”^!). 

Taking  the  Terman  Test  again  as  an  example  the  total  score 
is  220,  one  half  of  which  is  110.  A score  of  110  according  to 
the  norms  is  equivalent  to  a mental  age  of  14  years,  10  months. 
Since  14  years,  10  months  is  near  the  average  age  for  ninth 
grade  pupils,  it  can  be  seen  that  according  to  the  criterion 
the  Terman  Test  will  be  most  effective  as  a measuring  instru- 
ment in  the  ninth  grade. 

The  fourth  general  principle  in  selecting  intelligence  tests 

18.  Terman,  L.  M.  Manual  of  Directions  for  the  Terman  Group  Test  of 
Mental  Ability.  Yonkers-on-Hudson,  New  York:  World  Book  Company,  1920, 

p.  10. 

19.  Symonds,  Percival  M.  “Choice  of  Items  for  a Test  on  the  Basis  of  Diffi- 
culty.” Journal  of  Educational  Psychology,  20:481-493,  October,  1929. 

20.  Thurstone,  Thelma  Gwinn  “The  Relation  between  the  Difficulty  of  a 
Test  and  its  Diagnostic  Value.”  Abstracted  Doctor’s  Thesis,  University  of  Chi- 
cago, Abstracts  of  Theses,  Humanistic  Series,  5:97-102.  1926-1927. 

21.  Symonds,  Percival  M.  op.  cit.  p.  433. 


54 


TESTING  PROGRAMS 


may  be  stated  that,  other  things  being  equal,  tests  should 
be  selected  which  are  simply  given,  easily  and  rapidly  scored,, 
and  relatively  inexpensive. 

These  various  methods  give  some  basis  for  judging  the 
effectiveness  of  tests  for  the  grades  in  which  it  is  desired  to 
use  them.  They  are  not  absolute  rules  to  be  followed  but  sug- 
gestions of  methods  which  help.  Lists  of  suitable  intelligence 
tests  can  be  obtained  from  the  references  given  at  the  end 
of  Chapter  III  or  by  writing  to  the  various  publishers. 

In  studying  the  practice  of  schools  as  reported  in  the  check 
lists,  it  was  found  that  there  were  twenty-three  different 
intelligence  tests  which  were  mentioned  414  times.  Nearly 
200,000  tests  were  reported  as  given  but  there  were  many 
tests  mentioned  for  which  the  number  was  not  included.  This 
amount  of  testing  was  done  in  one  year  between  February 
1931  and  1932. 

To  obtain  a measure  of  the  popularity  of  the  tests,  they 
Were  ranked  according  to  number  of  times  mentioned  and 
number  of  copies  reported  given.  This  ranking  is  presented 
in  Table  XII  for  the  highest  live  tests.  Five  only  were  included 
because  it  was  felt  that  considering  the  small  number  of 
times  the  others  were  mentioned,  an  injustice  in  placement 
might  be  made.  All  of  the  tests  included  in  the  table  were 

TABLE  XII 

Ranking  of  Group  Intelligence  Tests  According  to  Frequeuncy 
of  Mention  and  Number  of  Tests  Given 


Junior  Senior  [ Six  Yr.  1 Total 

Test  M* *  No*  j M No  | M No  M No 

Otis**  | 1 1 j 1 1.  | i i j 1 1 

Terman***  2 2 2 2 ' 2 2 ' 2 2 

National***  3 5 3 6 j 3.5  7 3 5 

Kuhlman-Andersont  4 4 6 4 1 3.5  4(4  4 

Detroit  Adv.f  [ 8 3 | 4 3[5  5 | 5 3 


*M  column  ranks  the  tests  according  to  frequency  of  mention  and  No.  col- 
umn according  to  the  number  of  tests  reported  as  given  between  Feb.  1931 
and  Feb.  1932. 

**Otis  includes  the  Otis  Classification  Tests;  Otis  Group  Intelligence  Scale, 
Advanced  Examination;  and  the  Otis  Self  Administering  Tests  of  Mental  Ability, 
all  of  which  are  published  by  the  World  Book  Co. 

***Published  by  the  World  Book  Company. 

^Published  by  the  Educational  Test  Buureau. 
tPublished  by  the  Public  School  Publishing  Company. 


INTELLIGENCE  TESTING 


55 


mentioned  more  than  twenty  times  while  the  next  test  on 
the  list  was  mentioned  only  fourteen  times. 

There  are  several  things  concerning  Table  XII  which  need 
to  have  special  attention.  The  Otis  includes  three  tests  which, 
due  to  incomplete  reports,  could  not  be  separated.  This  fact 
means  that  the  Terman  Group  Test  of  Mental  Ability  is  the 
single  test  most  used  in  the  secondary  schools  today.  Another 
interesting  fact  is  the  similarity  of  ranking  for  the  last  three 
tests.  By  observing  in  Table  XII  the  material  under  the  col- 
umn headed  “Total,”  it  is  apparent  that  if  the  ranks  under  the 
number  of  times  mentioned  and  the  number  of  copies  given 
are  averaged,  the  last  three  tests  will  occupy  the  same  posi- 
tion. 

It  should  be  emphasized  that  this  list  does  not  constitute 
a ranking  of  the  tests  on  the  basis  of  merit  or  value  for  use 
in  the  secondary  schools.  Such  is  decidedly  not  the  case.  The 
outstanding  proof  of  this  contention  is  in  the  placement  of 
the  National  Intelligence  Scale.  The  National  Intelligence 
Scale  ranks  third  in  frequency  of  mention  and  from  the  stand- 
point of  number  given,  fifth  in  the  junior  high  school,  sixth 
in  the  senior  high  school  and  seventh  in  the  sixth  year  schools. 
Such  usage  throws  grave  doubts  on  the  efficacy  of  at  least 
some  of  the  testing  programs  as  they  are  now  conducted. 
The  National  Intelligence  Scale  should  not  be  used  beyond 
the  sixth  grade,  for  beyond  this  point  it  will  not  function 
correctly  for  the  more  superior  pupils.  This  statement  is  based 
on  data  from  a study  by  McAnulty(22)  and  on  statements 
in  the  manual  of  the  National  Intelligence  Test. 

In  comparing  mental  ages  obtained  from  the  National  Test 
with  those  from  the  Terman  on  the  same  group  of  pupils, 
McAnulty  found  that  the  National  results  were  consistently 
three  months  below  the  Terman  until  a mental  age  of  159 
months,  or  13  years,  3 months,  was  reached  when  the  norms 
became  identical.  From  there  the  National  showed  a much 
higher  mental  age  for  an  equivalent  score  until  at  the  upper 
limit  of  the  data  given  a mental  age  of  14  years,  2 months  on 
the  Terman  was  equivalent  to  15  years,  5 months  on  the 
National.  Thus  a child  of  eleven,  having  a M.  A.  of  14-2  on 

22.  McAnulty,  E.  A.  ;‘A  Comparison  of  the  Terman,  National  and  Stanford 
Binet  Tests.”  Educational  Research  Bulletin,  Los  Angeles  City  Schools,  8:5-7, 
October,  1928. 


TESTING  PROGRAMS 


the  Terman  would  have  an  I.  Q.  of  140  while  the  same  pupil 
taking  the  National  would  have  a M.  A.  of  15-5  and  an  I.  Q„ 
of  153.  A recent  revision  of  the  National  norms  has  corrected 
this  variation  of  the  extremes  somewhat  but  there  is  still 
some  difference.  The  manual  reads  as  follows,  “Mental  age 
equivalents  for  scores  lower  than  46  or  higher  than  136  (175 
months  or  14  yrs.  7 mo.)  may  be  estimated,  but  these  involve 
questionable  assumptions,  are  artificial  and  do  not  accord 
with  the  method  of  deriving  the  mental  ages”  (23). 

The  National  Intelligence  Test  can  be  used,  when  I.  Q.s  of 
over  140  are  not  expected,  with  pupils  having  an  average  age 
of  11  years  and  4 months.  If  very  high  mental  ages  are  not 
expected  due  to  the  nature  of  the  group  or  these  pupils  are 
to  be  retested  on  a more  advanced  test,  the  National  can  be 
used  with  still  older  children.  It  would  be  entirely  possible 
to  use  the  National  in  more  advanced  grades  if  better  norms 
for  scores  beyond  136  were  provided. 

Tests  should  be  selected  with  as  much  care  as  other  school 
material  and  in  most  cases  more  care,  for  decisions  affecting 
the  child’s  future  are  based  on  their  results.  If  the  persons 
who  select  the  tests  do  not  have  the  necessary  knowledge  to 
choose  them,  it  would  be  far  better  to  consult  some  reliable 
source  of  information. 

7.  How  Comparable  Are  Results  of  Different  Tests? 

Many  schools  use  two  or  three  different  intelligence  tests 
and  record  the  results  on  the  records  in  the  form  of  I.  Q.s. 
For  instance,  Mary  might  be  given  a Terman  test  at  the  be- 
ginning of  the  year,  the  results  being  110  I.  Q.  John  entered 
school  a month  after  the  test  was  given  so  he  is  given  an 
Otis  Self-Administering  Test  and  obtained  an  I.  Q.  of  110. 
Most  schools  assume  that  these  I.  Q.  are  equal,  but  are  they? 

There  have  been  a number  of  studies  which  have  been 
directed  at  answering  the  problem.  Stenquist(24)  in  1921 
seems  to  have  been  the  first  to  suggest  that  such  I.  Q.s  might 
not  be  comparable.  He  was  followed  by  Gates (25)  in  1923 

23.  Manual  of  Directions  for  Scale  A,  National  Intelligence  Tests.  Yonkers-on- 
Hudson,  New  York:  World  Book  Company,  192’),  p.  6. 

24.  Stenquist,  J.  L.  “Unreliability  of  Individual  Scores  in  Mental  Measure- 
ments.” Journal  of  Educational  Research.  4 347-354,  December,  1921. 

,25.  Gates,  A.  I.  “The  Unreliability  of  the  M.  A.  and  I.  Q.  Based  on  Group 
Tests  of  General  Mental  Ability.”  Journal  of  Applied  Psychology,  7:93-100, 
March,  1923. 


INTELLIGENCE  TESTING 


5? 


and  Miller (26)  in  1924.  The  problem  was  reopened  by  Kefau- 
ver(27)  in  1929  who  gave  a table  using  Miller’s  data  for 
equating  I.  Q.s  of  a number  of  group  tests.  Since  this  study 
there  have  appeared  articles  by  Cole  (28),  Cattell  and  Gaud- 
et(29),  Cattell  (1930)  (30),  Steckel(31),  and  Cattell 
(1931)  (32).  Recently  the  World  Book  Company  has  pub- 
lished a pamphlet  by  Runnels  entitled  Manual  for  Determin- 
ing the  Equivalence  of  Mental  Ages  Obtained  from  Group 
Intelligence  Tests  which  supplies  tables  for  equating  the  ment- 
al ages  of  some  of  the  group  tests. 

The  results  of  these  studies  have  all  indicated  that  the 
I.  Q.s  from  the  different  tests  vary  a great  deal.  Stenquist 
found  that  one  person  received  an  I.Q.  on  the  National  A of 
138;  on  the  National  B of  180;  on  the  Otis  Group  of  176;  on  the 
Haggerty,  Delta  2 of  174;  another  individual  received  I. Q.s  of 
160,  148,  120,  and  155  respectively.  These  are  two  extreme 
cases  and  show  little  consistency.  Data  from  Kefauver’s  table 
of  equivalents  in  presented  in  Table  XIII  for  three  levels 
equivalent  to  Terman  I.  Q.  of  60,  100,  and  140. 

According  to  the  material  in  Table  XIII  a pupil  getting  an 
I.  Q.  of  60  on  the  Terman  would  receive  I.  Q.s  ranging  from 
31  to  77  if  given  other  tests.  Similarily  for  a Terman  I.  Q.  of 
100,  the  range  is  from  98  to  114  and  for  one  of  140  the  range 
is  from  136  to  173.  There  is  much  more  variation  at  the  high 
and  low  levels  than  at  the  mean  (100  I.  Q.). 

These  studies  have  shown  that: 

1.  The  intelligence  quotients  and  mental  ages  vary  for  the  different  group 

tests  and  there  is  need  of  equating  them. 

2.  Even  though  the  median  I.  Q.s  of  different  tests  compare  favorably,  they 
may  vary  a great  deal  at  the  extremes. 

26.  Mill'er,  W.  S.  “The  Variation  and  Significance  of  Intelligence  Quotients 
Obtained  from  Group  Tests.”  Journal  of  Educational  Psychology,  15:359-366, 
September,  1924. 

27.  Kefauver,  Grayson  N.  “Need  of  Equating  Intelligence  Quotients  Obtained 
from  Group  Tests.”  Journal  of  Educational  Research,  19:92-101,  February, 
1929. 

28.  Cole,  Robert  D.  “A  Conversion  Scale  for  School  Intelligence  Tests.” 
Journal  of  Educational  Research,  20:190-198,  October,  1929. 

29.  Cattell',  Psyche  and  Gaudet,  Frederick  J.  “The  Inconstancy  of  the  I Q.s 
as  Measured  by  Repeated  Group  Tests.”  Journal  of  Educational  Research, 
21:21-28,  January,  1930. 

30.  Cattell,  Psyche  “Comparability  of  I.  Q.s  Obtained  from  Different  Tests 
at  Different  I.  Q.  Levels.”  School  and  Society,  31:437-442,  March  29,  1930. 

31.  Steckel,  Minnie  L.  “The  Re-standardization  of  I.  Q.s  of  Different  Tests  ” 
Journal  of  Educational  Psychology,  21:278-283,  April,  1930. 

32.  Cattell  Psyche  “Why  Otis  “I.  Q.”  Cannot  be  Equivalent  to  the<  Stanford 
Binet  I.Q.”  Journal  of  Educational  Psychology.  22:599-603,  November,  1931. 


58 


TESTING  PROGRAMS 


One  implication  that  these  findings  have  for  practice  is  that 
the  test  record  card  should  show  the  name  of  the  test  used. 
They  also  indicate  that  it  is  advisable,  in  case  two  tests  are 
being  used,  to  equate  the  results  by  means  of  tables  (Kefauv- 
er  or  Runnel),  in  terms  of  the  test  which  is  used  most  fre- 
quently. 


TABLE  XU3 


I.  Q.s  Equivalent  to  Terman  I.  Q.s  of  60,  100,  and  140  on  Several 
Group  Intelligence  Tests  for  Three  Levels  of  Ability! 


Tests 

Equivalent  I.Q.s 

Low 

Average 

| High 

Terman  

60 

100 

| 140 

Army  Alpha  

59 

104 

149 

Dearborn  II  C 

69 

104 

139 

Haggerty  Delta  2 

45 

103 

161 

Illinois  General  

61 

109 

157 

Miller  A 

31 

101 

172 

Miller  B 

56 

114 

173 

Pressey  Classification  

46 

98 

148 

Otis  Group  Adv 

77 

109 

139 

Otis  S.  A.  Higher 

70 

103 

137 

Otis  S.  A.  Inter | 

66  | 

102 

136 

fAdapted  from  Kefauver,  Grayson  N.  “Need  of  Equating  Intelligence  Quo- 
tients Obtained  from  Group  Tests.”  Journal  of  Educational  Research  19:92-101, 
February,  1929. 


Another  question  arises  in  connection  with  this  problem. 
Assuming  that  the  tests  are  not  comparable,  would  results 
be  improved  by  giving  two  group  tests  and  averaging  the 
results  instead  of  giving  one?  Brooks (33)  conducted  a com- 
prehensive study  on  this  problem  which  furnishes  an  answer 
to  the  question.  He  gave  the  Binet  individual  test  and  nine 
group  intelligence  tests  to  108  pupils  just  entering  junior 
high  school.  The  main  criterion  was  obtained  by  weighing  the 
results  of  the  Binet  by  9 and  each  group  test  by  1.  The  median 
difference  from  this  criterion  in  terms  pf  I.  Q.  points  was  cal- 
culated for  each  test  and  each  pair  of  tests.  The  five  pairs  and 
the  five  single  tests  showing  the  least  difference  from  the 
criterion  I.  Q.  are  presented  in  Table  XIV. 

In  considering  Table  XIV  it  appears  that  the  use  of  a pair 

33.  Brooks,  F.  D.  “The  Accuracy  of  Intelligence  Quotients  from  Pairs  of 
Group  Tests  in  the  Junior  High  School.”  Journal  of  Educational  'Psychology, 
18:173-186,  March,  1927. 


INTELLIGENCE  TESTING 


col  "tests  is  superior  to  the  use  of  a single  test,  but  the  table 
(does  not  show  that  out  of  the  thirty-six  pairs  of  tests,  twenty 
pairs  haye  a median  error  as  large  or  larger  than  the  Hag* 
;gerty  Test,  which  has  the  smallest  error  for  a group  test. 
There  were  even  thirteen  pairs  with  a median  error  larger 
than  Dearborn  C which  ranked  fifth  in  the  nine  tests  usd. 


TABLE  XIV 


Differences  Beween  I.  Q.s  ior  Some  Combinations  of 
Group  Intelligence  Tests  and  the  Criterion  I,  Q.s* 


Combination  Median  Difference  in  I.  Q 

Haggerty-IUinois  ... 3,46 

Dearborn  C-Illinois  3.75 

Dearborn  C-Terman  3.83 

Iilinois-Terman  _ „ 3.94 

Haggerty- Terman  4.00 

Haggerty  (Delta  2)  5.33 

Terman  ............. .............. 5.50 

Mean  of  the  36  pairs  5.53 

National  A ......... ......... 5.57 

Illinois  „ 5.89 

Dearborn  C 6.00 

Mean  of  the  9 single  tests  ...7.07 


* Adapted  from  Brooks,  Ibid,,  pp.  175-176. 

These  data  show  that  two  tests  are  not  necessarily  better 
than  one  but  that  it  depends  upon  the  comparisons  made.  If 
one  gives  the  pair  showing  least  difference,  Haggerty-Illinois, 
it  will  decrease  the  median  error  in  the  I.  Q.  only  2 points 
over  what  either  the  Haggerty  or  the  Terman  would  yield. 

Summary.  I.  Q.s  from  different  group  intelligence  tests 
are  not  comparable,  so  if  two  tests  are  used,  the  results  should 
be  equated  by  means  of  tables  which  have  been  providd.  The 
name  of  the  test  given  should  always  be  placed  on  the  test 
record.  The  use  of  two  tests  does  not  necessarily  decrease  the 
median  error  of  the  I.  Q.,  though  by  using  the  best  pair  as 
compared  to  the  best  single  test,  the  error  can  be  decreased 
2 points. 


8.  Who  Should  Give  Intelligence  Tests? 

There  are  wide  differences  of  opinion  as  to  who  should 
give  intelligence  tests,  especially  group  tests.  Authorities  are 
agreed  that  only  a trained  examiner  should  be  allowed  to  give 


60 


TESTING  PROGRAMS 


individual  intelligence  tests.  However,  some  advocate  that 
teachers  should  be  trained  for  that  purpose  while  others  feel 
that  such  work  should  be  left  to  the  trained  psychologists. 
Dickson  and  Martins  (34)  have  shown  that  teachers  can  be 
trained  in  service  so  the  results  of  the  Binet  tests  vary  little 
from  those  of  a professional  examiner.  With  the  increasing 
use  of  counselors  or  guidance  workers  in  the  schools,  it  would 
seem  more  economical  to  train  such  persons  in  individual 
testing  rather  than  attempt  to  make  use  of  teachers. 

In  the  schools  studied  it  was  found  that  individual  intelli- 
gence tests  are  given  by  trained  people  in  most  cases.  In  sixty- 
five  per  cent  of  the  schools  using  individual  intelligence  tests 
such  tests  were  given  by  a psychologist,  counselor,  member 
of  the  research  department,  or  by  some  combination  of  the 
three.  Sixteen  per  cent  were  given  by  a trained  teacher  with 
a psychologist,  counselor  or  member  of  the  research  depart- 
ment. 

In  11%  of  the  schools  the  principal  gave  the  tests.  This 
seemingly  high  percentage  was  largely  due  to  the  practice 
found  in  the  schools  in  towns  under  25,000.  Of  the  schools 
in  these  towns  which  reported  they  had  given  individual  tests, 
45%  had  such  tests  given  by  the  principal.  There  is  no  means 
of  telling  how  well  trained  these  principals  were  in  the  admin- 
istering of  these  tests,  so  one  cannot  judge  as  to  the  efficacy 
of  this  testing. 

The  fact  that  such  a large  per  cent  of  the  principals  in  small 
communities  do  give  individual  tests  has  definite  implica- 
tions for  the  training  program  of  such  principals.  Most  of  the 
individual  intelligence  tests  are  given  as  a means  of  obtaining 
more  information  about  some  type  of  probem  or  mal-adjusted 
pupil  (see  Table  XI  for  a list  of  reasons).  It  should  be  empha- 
sized that  the  individual  test  is  only  one  means  of  diagnosis 
and  that  there  are  other  methods  which  are  just  as  important. 
These  data  would  seem  to  lead  to  the  conclusion  that  some- 
where in  the  training  program  of  high  school  principals,  es- 
pecially the  ones  who  will  probably  go  into  small  communi- 
ties, there  should  be  an  adequate  course  in  the  diagnosis  and 
treatment  of  problem  pupils  of  all  types.  Training  in  the  giv- 

34.  Dickson,  V.  E.  and  Martins,  E.  H.  “Training  Teachers  for  Mental  Test- 
ing in  Oakland,  California.”  Journal  of  Educational  Research,  7:100-108,  Febru- 
ary, 1923. 


INTELLIGENCE  TESTING 


61 


ing  of  individual  intelligence  tests  would  be  only  part  of  such 
a course. 

The  administering  of  group  tests  does  not  require  the  speci- 
alized training  of  Binet  testing  and  there  is  general  agreement 
that  teachers  can  be  taught  to  administer  group  tests  success- 
fully. The  only  point  of  difference  is  whether  such  procedure 
is  economical.  Symonds  has  discussed  the  administration  of 
tests  for  purposes  of  guidance  rather  completely.  He  recom- 
mends: 

“Tests  for  guidance,  on  the  other  hand,  are  better  left  to  the  guidance  coun- 
selor or  to  the  person  who  is  responsible  for  the  guidance  progam.  In  the  first 
place,  the  administration  of  these  tests  often  requires  specialized  skill  which  only 
a trained  person  possesses.  For  instance,  some  tests  require  careful  timing, 
while  others  are  difficult  to  score.  Many  of  these  tests  and  questionaires  may 
be  most  economically  given  to  large  groups  gathered  in  the  assembly  hall'  or 
lunchroom;  others  must  be  given  to  one  pupil  at  a time.  In  the  second  place, 
many  of  these  questionaires  call  for  confidential  information  which  should 
be  in  the  possession  only  of  the  individual  who  is  trained  to  make  correct 
interpretation  of  it.  If  guidance  is  the  function  of  every  teacher,  then  every 
teacher  shouuld  be  intrusted  with  this  informaion.  On  the  other  hand,  many 
are  beginning  to  realize  that  guidance  is  not  something  which  anyone  can 
‘pick  up’  but  that  it  requires  specialized  training.  In  the  third  place,  tests  used 
for  guidance  should  be  interpreted  in  relation  to  one  another.  Someone  should 
see  the  child  as  a whole,  rather  than  as  a learner  of  subject  matter.  For  these 
reasons,  it  seems  desirable  that  the  administration  of  tests  for  guidance  be 
intrusted  to  the  specialist  in  guidance”  (35). 

Baltimore  (36)  uses  a system  of  teacher  examiners  in  the 
elementary  school  which  might  be  suggestive  of  possibilities 
for  the  secondary  schools.  The  examiners  meet  once  a month 
at  which  time  the  tests  to  be  given  and  the  results  of  pre- 
vious testing  are  discussed.  The  value  of  such  a corps  of 
examiners  to  a large  city  lies  in  the  speed  and  efficiency  with 
which  testing  can  be  done. 

A study  by  Keys  furnishes  some  experimental  evidence, 
which  gives  tentative  basis  on  which  to  judge  what  practice 
should  be  followed.  He  finds  that, 

“Tests  given  under  public  school  conditions  by  regular  teachers  trained  for 
the  purpose,  to  children  scattered  over  more  than  thirty  class  sections  in  three 
widely  separated  school's,  were  found  to  yield  correlations  closely  comparable 

35.  Symonds,  Percival  M.  “The  Testing  Program  for  the  High  School.” 
School  Review,  40:105-106,  February,  1932. 

36.  Stenquist,  John  L.  “Getting  Research  in  Practice  in  a Large  School 
System.”  American  School  Board  Journal,  Vol.  19,  November,  1930,  pp.  41-42, 
December,  1330,  pp41-42. 


TESTING  PROGRAMS’ 


6? 

with  those  obtained  by  experienced  psychologists  on  retests  of  s single  dass 
or  school.  The  evidence  to  this,  effect  is  one  of  the  most  encouraging  outgrowths; 
of  the  present  study,  for  the  promise  it  holds  of  results  to  be.  anticipated! 
from  the  widespread  use  of  standard  tests”  (37) 

The  evidence  which  Keys  presents  indicates  that  teachers, 
get  practically  the  same  results  when  they  give  standardized, 
achievement  or  group  intelligence  tests  as  do  special  exami- 
ners. This  means  that  the  issue  as  to  who  shall  give  the  tests; 
must  be  decided  on  some  other  basis  than,  that  o£  capability 
to  test 

Summary.  Individual  intelligence  tests  should  be  given 
by  a trained  worker.  Considering  the  usual  school  situation,, 
it  would  seem  advisable  either  to  have  such  tests  given  by 
members  of  the  research  department,  psychologists,  counsel- 
ors, or  guidance  workers  in  the  school. 

Group  intelligence  tests  can  be  given  by  classroom  teach- 
ers after  some  training,  though  in  the  interests  of  economy 
and  uniformity  it  would  appear  more  advisable  for  one  person 
in  the  school  to  administer  such  tests.  This  person  may  well 
be  the  principal  in  small  schools  the  vice-principal  or  coun- 
selor. In  some  cities  where  there  are  research  departments 
it  has  been  found  advisable  to  have  a teacher  in  each  build- 
ing trained  to  do  the  examining. 

There  seems  to  be  need  for  the  principal  in  the  small  town 
to  be  trained  in  giving  individual  intelligence  tests.  This 
would  be  one  phase  of  the  course  for  the  study  of  problem 
cases  recommended  for  such  principals.  Further  suggestions 
have  been  made  in  section  5. 

9.  Who  Should  Score  the  Tests? 

The  scoring  of  intelligence  tests  is  accomplished  in  practice 
by  many  combinations  of  people.  Since  such  scoring  may  be 
considered  to  be  only  a clerical  job  to  be  accomplished  as 
quickly  and  as  cheaply  as  possible,  some  of  the  combinations 
found  are  most  surprising.  In  nearly  two-fifths  (38%)  of  the 
schools,  the  teachers  do  all  of  the  scoring.  Assuming  that  it 
takes  five  minutes  to  correct  a test  and  there  are  seven  hours 
in  a teaching  day,  it  took  those  teachers  905  teaching  days  to 
score  their  share  of  the  200,000  intelligence  tests.  Since  there 

37.  Keys,  Noel  The  Improvement  of  Measurement  Through  Cumulative 
Testing.  New  York:Teachers  College  Bureau  of  Publications,  Columbia  Uni- 
versity, Contributions  to  Education  No.  321,  1928,  p.  79. 


INTELLIGENCE  TESTING 


63 


is  no  value  to  the  teacher  in  correcting  intelligence  tests,  such 
procedure  is  a waste  of  time  and  money.  The  situation  is 
really  worse  than  reported  for  in  an  additional  16  per  cent  of 
the  schools  the  tests  were  scored  by  either  counselors,  certain 
trained  teachers,  a combination  of  teachers  and  clerks,  or  a 
committee  of  teachers. 

Another  waste  that  occurred  in  5%  of  the  schools  was  caused 
by  the  principal  or  vice-principal  correcting  these  tests.  This 
procedure  was  largely  limited  to  the  schools  in  small  towns, 
but  nevertheless  it  would  appear  to  be  a poor  practice. 

More  economical  methods  consisted  in  having  the  tests 
scored  in  19%  of  the  schools  by  the  research  departments 
(presumably  by  clerical  help  in  the  research  office),  in  12% 
by  clerks,  and  in  3%  by  students.  In  the  remaining  7%  of 
the  schools  the  tests  were  scored  by  various  combinations  of 
persons. 

There  is  nearly  as  much  disagreement  between  the  opin- 
ions expressed  in  the  literature  as  there  is  between  practice. 
Some  feel  that  the  teachers  should  score  the  tests,  while 
others  are  insistent  that  such  scoring  be  done  either  by  ad- 
vanced pupils  under  supervision  or  by  clerks.  There  is  no 
definite  answer  to  the  problem.  Each  school  must  work  out 
a plan  which  will  be  satisfactory  to  its  own  situation.  The 
decided  lack  of  value  that  such  scoring  has  for  the  teacher 
should  be  considered  when  making  this  plan. 

Errors  made  in  scoring.  There  is  some  question  to  what 
extent  the  scoring  of  tests  should  be  checked  a second  time. 
This  problem  has  been  studied  by  Pintner(38),  Madsen(39), 
Dearborn  and  Smith  (40),  and  Johnson  (41).  Pintner,  using 
an  experimental  blank,  made  uo  of  many  doubtful  ambigu- 
ous responses,  found,  “A  wide  range  of  scores  allotted  by  stu- 
dents. The  deviation  from  the  ‘true  score’  is  very  great.  Dis- 
cussions of  the  principles  of  scoring  and  the  checking  of  each 

38.  Pintner,  R.  “Accuracy  in  Scoring  Group  Intelligence  Tests.”  Journal 
of  Educational  Psychology,  17:470-475,  October,  It  26. 

39.  Madsen,  I.  N.  “Participation  in  Testing  Programs  by  the  Classroom 
Teacher.”  Educational  Administration  and  Supervision,  15:117-126,  February, 

1929. 

40.  Dearborn,  W.  F.  and  Smi+h,  C.  W.  “The  Results  of  Re-scoring  Five 

Hundred  Thirty  Dearborn  Tests.”  Journal  of  Educational  Psychology,  20:177- 
183,  March,  1929. 

41.  Johnson,  T.  A.  “Errors  in  Intelligence  Test  Scoring.”  Unnublished 
master’s  thesis  on  file  in  the  library  of  Teachers  College,  Columbia  University, 

1930. 


64 


TESTING  PROGRAMS 


others’  papers  during  a term’s  work  on  intelligence  testing 
tends  to  reduce  these  deviations  considerable”  (42). 

Combining  the  results  of  the  four  studies  it  is  found  that 
the  most  frequent  errors  are  1.  misunderstanding  of  scoring 
directions,  2.  differences  in  subjective  estimates,  3.  observa- 
tional errors  in  counting,  4.  adding  totals,  5.  getting  sub- 
totals of  tests,  6.  entering  scores  on  the  front  of  the  booklet, 
7.  errors  in  R-W  computations,  8.  neglect  to  multiply  by 
weights,  9.  converting  scores  into  subject,  ages  or  grade 
scores,  10.  crediting  items  marked  correctly  by  chance  when 
the  pupil  marks  the  same  response  throughout  the  test,  and 
11.  carelessness. 

In  light  of  these  findings  it  would  seem  advisable  to  sample 
through  the  tests  to  see  if  the  individual  items  have  been  cor- 
rectly scored  and  then  check  all  other  processes  which  are 
performed  to  obtain  the  results  in  final  form.  This  checking 
should  be  done  preferably  by  some  other  person  and  includes 
checking  the  totals  of  each  test  and  the  grand  total,  all  R-W 
computations,  all  weightings,  all  copying  of  scores,  and  all 
transmuting  of  totals  into  age  or  grade  scores. 

Summary.  It  seems  advisable  to  have  intelligence  tests 
scored  by  clerks  or  advanced  pupils  whenever  possible.  Stud- 
ies of  errors  made  in  scoring  intelligence  tests  indicate  that 
it  is  advisable  to  perform  twice  every  operation  in  connec- 
tion with  scoring  a test  except  the  rescoring  of  the  individual 
items.  These  items  should  be  sampled  to  see  that  no  consistent 
errors  exist. 

10.  In  What  Form  Should  Test  Results  be  Made  Available? 

The  first  result  that  is  obtained  from  an  intelligence  test  is 
the  score.  This  score  is  commonly  referred  to  as  the  raw  score 
or  point  score.  On  some  tests  it  is  derived  by  counting  the 
number  of  items  a pupil  had  correct,  while  on  others  each 
subtest  is  weighted  and  then  the  total  is  found.  Such  a raw 
score  is  meaningless  by  itself.  It  needs  to  be  interpreted  in 
terms  of  some  common  unit  which  most  people  understand 
and  also  into  a unit  which  is  used  with  other  tests,  so  compara- 
tive results  can  be  obtained.  The  most  commonly  used  bases 
for  reporting  these  scores  are  intelligence  quotient  (I.  Q.), 
mental  age  (M.  A.),  letter  ratings,  and  percentile  scores. 


42.  Pintner,  R.  op.  cit. 


INTELLIGENCE  TESTING 


65 


A mental  age  of  12  years  is  equivalent  to  the  score  obtained 
by  the  average  12  year  old  pupil  on  the  test.  Most  intelligence 
tests  apply  mental  age  norms.  In  the  elementary  school  there 
is  no  difficulty  in  using  these  age  norms  as  a basis  for  inter- 
preting test  results.  The  high  school  presents  a different  situ- 
ation which  raises  the  questions  discussed  in  this  section  and 
the  next. 

Mental  growth  increases  at  a fairly  constant  rate  until  early 
adolescence  is  reached,  when  it  then  begins  to  slope  off,  gradu- 
ally becoming  a plateau.  It  is  this  growth  curve  which  is  the 
cause  of  the  difficulty.  If  mental  growth  continued,  the  same 
transmuted  measures  could  be  used  as  in  the  elementary 
school,  but  since  it  does  not,  a question  has  arisen  as  to  what 
should  be  used. 

The  intelligence  quotient  or  I.  Q.  is  found  by  dividing  the 
mental  age  by  chronological  age,  thus  I.Q.  is  M.A.-^-C.A.xlOO. 
It  provides  a measure  of  brightness:  for  instance,  if  a child  has 
the  same  mental  age  as  the  average  child  of  his  age,  he  would 
have  an  I.  Q.  of  100.  If  he  is  brighter  than  the  average,  his 
I.  Q.  would  be  above  100  and  if  duller,  below  100.  Since  the 
I.  Q.  is  dependent  upon  the  M.  A.,  it  has  the  same  defects 
for  use  in  the  high  school  as  the  M.  A.  Where  either  of  them 
are  used,  the  mental  age  and  I.  Q.  should  be  used  together, 
for  the  mental  age  tells  what  level  of  intelligence  the  student 
has  reached  and  the  I.  Q.  furnishes  a measure  of  his  relative 
brightness  for  children  of  his  age.  Both  concepts  are  necessary 
to  interpret  the  pupil’s  ability. 

Percentile  scores  provide  a simple  means  of  placing  the 
pupil  in  relation  to  either  the  other  pupils  to  whom  the  test 
was  given,  or  to  pupils  within  his  grade  or  age.  If  percentile 
norms  are  provided  on  an  intelligence  test,  they  usually  are 
by  grades.  A percentile  score  of  60  would  indicate  that  this  • 
pupil  did  as  well  as  60  per  cent  of  the  pupils  in  his  grade. 

Letter  ratings  are  either  made  out  from  I.  Q.s  or  percentile 
ratings  by  the  individual  school  systems.  For  instance,  all 
I.  Q.s  above  130  might  be  ranked  A,  between  110  and  130  B, 
between  90  and  110  C,  and  between  70  and  90  D,  and  below 
70  E. 

Symonds,  in  discussing  the  problem  as  to  which  derived 


66 


TESTING  PROGRAMS 


Mental  age  as  a unit  may  be  safely  ignored  in  the  high  school.  There  is  no 
advantage  to  be  gained  by  turning  scores  into  mental  ages.  All  the  interpreta- 
tions one  wishes  to  make  with  intelligence  tests  scores  can  be  easily  made  by 
using  a table  of  norms  (percentile  norms)  such  as  is  given  for  the  ‘Terman 
Group  Test  of  Mental  Ability  ...  It  was  stated  that  mental  age  as  a unit  has 
several  inadequacies  for  use  in  the  high  school.  .Since  I.  Q.  is  computed  from 
M.A.  the  I.Q.  suffers  from  the  same  inadequacies”  (43). 

He  goes  on  to  state  that  the  I.  Q.  has  three  special  difficulties 
of  its  own.  First,  beyond  age  14  it  is  not  known  what  age 
should  be  used  as  the  denominator  of  the  ratio.  This  point 
will  be  discussed  in  the  next  section.  Second,  a related  diffi- 
culty is  that  it  is  not  known  definitely  when  mental  growth 
ceases.  Third,  all  persons  do  not  reach  mental  maturity  at 
the  same  age.  He  also  states  that  I.  Q.  should  not  be  used  for 
sectioning  but  rather  mental  level.  Recently  he  has  reversed 
his  opinion  on  this  last  point: 

“Data  recently  obtained  throw  new  light  on  this  issue  and  the  writer  now 
believes  that  the  I.  Q.  is  preferable  to  the  score  for  purposes:  of  grouping  in 
the  high  school”  (44). 

The  I.  Q.  is  a concept  which  has  become  so  popular  that, 
even  though  there  might  be  better  transmuted  scores,  it  is 
used  as  the  basis  for  interpretating  intelligence  test  results 
in  most  of  the  secondary  schools  throughout  the  country 
today. 

Ninety-seven  per  cent  (45)  of  the  schools  which  make  re- 
sults available  to  their  teachers  use  this  form.  The  mental  age, 
or  M.  A.,  is  used  by  49%  of  the  schools,  the  raw  score  is  given 
in  34"  of  the  cases,  and  the  letter  rating  is  used  in  10%.  Other 
methods  of  reporting  the  results  include  the  use  of  “T”  scores, 
classification  index,  a composite  made  up  of  I.  Q.  and  E.  Q., 
percentile  ranks,  index  of  brightness,  mental  age  grade  scores, 
rank  in  class,  “P.  L.  R.”  (Probable  Learning  Rate,  used  in 
the  Cleveland  schools),  “X.  Y.  Z.”  ratings,  graphic  presenta- 
tion, and  interpretation  by  the  counselor. 

The  main  difference  between  the  practice  in  the  various 


43.  Symonds,  Percival  M.  Measurement  in  Secondary  Education.  New  York: 
The  Macmillan  Company,  1927,  pp.  314-319. 

44.  Symonds,  Percival  M.  “Shall  the  I.  Q.  be  Used  for  Sectioning  in  the 
High  School?”  Journal  of  Educational  Research,  24:138-140,  September,  1931. 

45.  Due  to  the  large  number  of  schools  reporting  various  combinations, 
the  number  of  times  each  method  was  reported  has  been  changed  to  a per 
cent.  Hence  the  97%  does  not  refer  to  the  per  cent  of  schools  using  only  the 
I.  Q.  but  to  the  per  cent  using  the  I.  Q.  either  alone  or  in  any  possible  combina- 
tion. 


INTELLIGENCE  TESTING 


S’ 


types  of  schools  is  in  the  use  of  the  I.  Q.  as  the  sole  means  of 
making  results  available  to  teachers.  About  a fourth  (26%) 
of  the  junior  high  schools  follow  this  practice  as  compared 
with  a third  (32%)  of  the  six  year  schools  and  two-fifths  (40%) 
of  the  senior  high  schools.  Though  the  I.  Q.  is  recommended 
least  for  use  in  the  high  school,  it  is  more  often  the  sole  meth- 
od of  presenting  results  than  any  other  form. 

It  would  appear  that  the  I.  Q.  will  continue  to  be  used  in  the 
secondary  school  but  that  its  limitations  as  pointed  out  in  Sec- 
tion 7,  in  this  section,  and  in  the  one  to  follow  should  be 
understood  by  those  who  use  it. 

11.  How  Should  I.  Q.  s Be  Figured? 

It  is  the  purpose  of  this  section  to  consider  what  age  should 
be  used  as  a maximum  denominator  in  figuring  I.  Q.s  and 
to  consider  some  of  the  mechanical  aids  in  computing  such 
I.  Q.s. 

One  of  the  main  difficulties  in  the  use  of  the  I.  Q.  as  ex- 
pressed in  the  previous  section  is  due  to  the  fact  that  the  best 
age  to  use  as  a maximum  denominator  is  not  known.  In  prac- 
tice there  is  agreement  that  for  older  pupils  there  should  be 
a single  age  used  as  the  denominator  and  not,  as  in  the  case 
with  younger  pupils,  the  pupil’s  chronological  age.  There  is 
disagreement  concerning  what  age  will  be  used.  Terman  is 
one  of  the  strongest  advocates  for  16  and  due  to  the  fact 
that  16  is  the  age  he  recommends  for  use  in  both  the  Stan- 
ford Revision  of  the  Binet  and  the  Terman  Group  Test  of 
Mental  Ability,  it  is  undoubtedly  the  practice  most  followed. 
Pintner  is  just  as  strong  in  urging  14  as  the  correct  age.  In 
assuming  the  evidence  from  the  testing  of  adults  in  the  army, 
Pintner  stresses  that: 

From  these  results  it  would  look  as  if  average  adult  mentality  as  represented 
in  the  army  achieves  about  as  much  as  children  at  age  14  or  thereabout  on  our 
present  tests  of  intelligence.  For  this  reason  it  would  seem  desirable  to  use 
14  as  a basis  for  the  calculation  of  I.  Q.s  for  older  individuals”  (46). 

Ho  presents  additional  evidence  of  the  median  I.  Q.s  for  each 
year  in  four  select  private  schools  and  one  public  school. 
Using  age  16  as  an  upper  limit  the  median  I.  Q.  decreases  as 
the  grades  increase.  It  would  thus  appear,  when  age  16  is 


46.  Pintner,  Rudolph  Intelligence  Testing.  New  Edition.  New  York!:  Henry 
Holt  and  Company,  1931,  p.  83. 


68 


TESTING  PROGRAMS 


used,  that  the  upper  years  are  less  selective  than  the  early 
years  of  high  school.  This  is  known  not  to  be  the  case.  In  these 
private  schools,  whatever  selection  exists  is  toward  retaining 
the  better  pupils  and  dropping  the  poorer  ones.  When  age 
14  is  used  a steady  increase  is  noted  in  the  medians  from 
grade  to  grade.  This  increase  is  what  one  nominally  would 
expect.  Though  this  type  of  research  does  not  furnish  a defi- 
nite answer  to  the  question  it  is  at  lesat  indicative  that  14  is 
better  than  16. 

TABLE  XV 

Median  I.  Q.s  of  the  Four  Years  of  High  School  Calculated 
Using  Age  14  and  Age  18  as  Upper  Limits* 


Select  Private  Schools  Public  School 


Year 

a 

b 

c 

d 

e 

Using  Age  16  as 
upper  limit 

Freshman  

1 

1 

| 111 

1 

| 116 

117 

118 

. 

91 

.Sophomore 

111 

113 

113 

115 

94 

Junior 

| 106 

j 107 

111 

116 

95 

Senior  

100 

114 

111 

112 

105 

Using  Age  14  as 
upper  limit 

Freshman  

113 

116 

121 

121 

101 

Sophomore  

117 

124  ! 

122  | 

123 

107 

Junior  

120 

124 

125 

130 

109 

Senior  

123 

130 

127 

120 

120 

*Adapted  from  Rudolph  Pintner,  op.  cit.,  p.  84. 


Dearborn  (47)  says  that  age  141/2  should  be  used  as  an  upper 
limit.  The  extreme  view  is  taken  by  Thorndike.  After  care- 
fully testing  and  retesting  a large  number  of  pupils,  Thorn- 
dike concludes  that: 

The  doctrine  that  the  ability  to  improve  one’s  score  in  a measure  of  intelli- 
gence necessarily  ceases  at  14  or  16  then  should  be  abandoned.  Indeed,  there 
seems  to  be  evidence  that  this  ability  improves,  at  least  in  the  case  of  those 
who  are  subject  to  intellectual  education,  beyond  18” (48). 

The  problem  is  still  unanswered  but  it  seems  that  there  are 
possibilities  in  a research  which  will  answer  the  question  (49). 

The  other  question  to  be  discussed  is  an  economical  means 

47.  Dearborn,  Walter  F.  “The  Intelligence  Quotients  of  Adults  and  Related 
Problems.”  Journal  of  Educational  Research,  6:307-315,  November,  1922. 

48.  Thorndike,  E.  L.  “On  the  Improvement  in  Intelligence  Scores  from 
Thirteen  to  Nineteen.”  Journal  of  Educational  Psychology,  17:73-76,  February, 
1926. 


INTELLIGENCE  TESTING 


69 


of  computing  I.  Q.s  from  the  test  scores.  The  proposed  method 
involves  the  use  of  the  Inglis  Intelligence  Quotient  Values(50). 
First,  the  tests  should  be  carefully  checked  to  see  that  the 
total  scores  are  correct.  Second,  the  ages  of  the  pupils  must 
be  calculated  in  years  and  months.  Most  tests  ask  the  questions 
— How  old  were  you  your  last  birthday?  and — When  is  your 
birthday?  From  the  first  question  the  age  is  obtained  directly. 
The  difficult  task  is  to  figure  the  months.  An  excellent  method 
is  to  make  out  a table  giving  the  names  of  the  months  and  the 
the  number  of  months  each  is  from  the  month  in  which  the 
testing  was  done.  To  make  this  clearer,  it  will  be  developed 
step  by  step  using  Table  XVI  to  illustrate  it. 

1.  Write  down  the  months  in  order  starting  with  January. 

2.  Underline  the  month  in  which  the  tests  were  given.  (In  this  case  they  were 
given  September  21st. 

3.  Number  backwards  though  the  months  starting  with  the  month  just  pri*- 
vious  to  the  one  in  which  the  testing  is  done.  August  would  be  1,  July 
would  be  2,  December  would  be  9 and  October  11. 

4.  All  the  months  except  the  one  in  which  the  tests  are  given  are  cared  for. 
To  provide  for  that  write  at  the  edge  the  date  the  test  was  given 
(the  21st). 

5.  Place  a O above  the  line  as  indicated. 

6.  Place  1 yr.  below  the  line. 


TABLE  XVI 

A Method  of  Figuring  the  Number  of  Months  Since 
Last  Birthday 


No.  of  Months  since 
Last  Birthday 

Z. 8 

7 

6 

5 

4 

3 

2 

1 

0 21st 

1 yr. 


October  

November  jq 

December  9 


Month  of  Birthday 

January  

February  .... 

March  

April 

May 

June  

July  

August  

September  


The  table  is  now  ready  for  use.  A twelve  year  old  pupil 
having  a birthday  in  August  would  be  12-1,  or  12  years  1 


TESTING  PROGRAMS 


TO 

month  old.  A thirteen  year  old  pupil  whose  birthday  is  in 
December  would  be  13-9.  A twelve  year  old  pupil  whose 
birthday  is  on  September  24  would  be  counted  as  13-0,  while 
one  whose  birthday  was  just  before  the  tests  were  given,  say 
September  18,  would  be  12-0. 

Alter  all  the  ages  in  terms  of  years  and  months  have  been 
placed  on  the  cover,  the  next  step  is  to  figure  the  I.  Q.  By 
most  methods  the  mental  age  score  is  put  on  each  test  book- 
let, then  the  I.  Q.  is  figured  by  dividing  the  mental  age  by 
the  chronological  age.  The  methods  presented  here  eliminate 
the  necessity  of  copying  mental  ages  on  each  test  booklet. 

1.  Write  at  the  top  of  the  pages  of  the  intelligence  Quotient  Values,  the 
norms  for  the  test  as  taken  from  the  manual  of  directions  of  the  test  which  is 
being  used.  Table  XVII  reproduces  part  of  one  page  of  the  values.  The  numbers 
which  are  written  in  above  the  table  are  scores  on  the  Terman  Group  Test 
which  correspond  to  the  mental  ages  given  below.  The  norms  for  the  Terman 
Group  Test  give  a score  of  70  as  equivalent  to  a mental  age  of  12  yrs. 
11  months,  a score  of  75  as  equivalent  to  a mental  age  of  13  yrs.  2 months,  and 
a score  of  80  as  equal  to  13  yrs.  5 months.  So  75  is  written  above  13-2  and  80 
is  written  above  13-5,  and  in  the  same  manner  for  all  the  scoi'es  given  in  the 
norms  of  the  test.  When  this  is  complete  there  may  appear  many  mental  ages 
which  have  no  score  written  above  them.  These  scores  are  filled  in  by  interpola- 
tion. In  most  cases  it  can  be  done  by  inspection  as  was  done  for  Table  XVII. 
There  are  5 points  (80-75)  to  be  distributed  over  the  13-3  and  13-4  mental  ages. 
Thus  13-3  will  be  assigned  a score  value  of  77  and  13-4  a score  value  of  79. 
This  same  procedure  is  followed  for  all  the  mental  ages  which  do  not  have 

49.  Working  on  the  assumption  that  the  I.  Q.  is  constant,  one  could,  in  a 
situation  where  cumulative  intelligence  records  were  available  from  the  element- 
ary period,  arrive  at  a fairly  satisfactory  answer.  The  procedure  would  be 
somewhat  as  follows: 

1.  Determine  the  I.  Q.  preferably  from  results  of  individual  intelligence 
tests,  which  each  pupil  had  in  the  elementary  school  (probably  between 
ages  of  7 and  11). 

2.  Separate  the  cases  into  various  I.  Q.  levels,  say  all  above  140  in  one 
group,  126  to  140  in  another  group,  91  to  110  in  a fourth,  76  to  80  in  a fifth, 
60  to  7 5 in  a six  h and  below  60  in  the  last.  The  data  would  be  studied  on 
each  of  these  level's. 

3.  Using  test  records  already  available  or  by  giving  a new  group  test  to 
all  the  pupils,  obtain  mental'  a0es  for  all  the  pupils. 

4.  Calculate  present  I.  Q.s  on  all  the  pupil's  using  age  13,  13%,  14,  14%, 
15,  15%,  and  16  as  denominators  respectively.  Pupils  younger  than  13 
at  present  would  be  excluded  before  the  figuring  was  done  for  the  first 
series  of  I.  Q.s.  Pupils  younger  than  13%  would  have  to  be  excluded 
before  the  second  series  was  calculated,  etc.  No  pupils  younger  than 
the  age  used  as  the  denominator  coul’d  be  included  in  the  successive 
calculations. 

5.  Compute  the  mean  difference  between  the  I.  Q.  obtained  in  the  element- 
ary school  and  each  of  the  successive  I.  Q.  figures  on  different  age  levels. 

6.  The  series  of  I.  Q.s  showing  the  least  mean  difference  would  indicate 
the  correct  age  to  use  as  a maximum.  The  age  might  vary  for  the  diff- 
erent I.  Q.  levels. 

50.  Published  by  the  World  Book  Company,  Yonkers-on-Hudson,  New  York. 


INTELLIGENCE  TESTING 


71 


score  values  given  in  the  norms  (51).  The  Intelligence  Quotient  Values  are  now 
ready  to  be  used  and  once  this  initial  labor  is  completed  it  will'  not  have  to  be 
repeated,  as  long  as  the  same  test  is  used. 

2.  To  calculate  the  I.  Q.s  of  the  class  take  the  first  paper,  which  in  this 
case  has  a score  of  77  and  is  11  years,  2 months.  Follow  down  the  column 
headed  77  until  the  row  labelled  11-2  is  reached  and  the  I.  Q.  of  119  is  read. 
A score  of  80,  age  11  years,  5 months  would  give  an  I.  Q.  of  118.  A difficulty 
arises  when  a score  which  is  not  given  occurs,  say  a score  of  78,  chronological 
age  11-0.  In  that  case  the  nearest  score  in  the  table  is  taken  and  if  the  score 
is  directly  between  two  scores,  the  higher  score  is  taken;  this  pupil,  then, 
would  have  an  I.  Q.  of  121. 


TABLE  XVII 

A Section  of  the  Lnglis  “Intelligence  Quotient  Values’’ 
Illustrating  a Rapid  Method  of  Figuring  I.  Q.s 


MENTAL  AGE 


t,1 

o 

< 

Terman  Scores 

72 

74 

75 

77 

79 

80 

00 

ro 

Years 

13 

13 

13 

13 

13 

13 

13 

ij 

Months 

0 

1 

2 

3 

4 

5 

6 

o 

11-0 

118 

119 

120 

120 

121 

122 

123 

o 

( ) 

11-1 

117 

118 

119 

120 

120 

121 

122 

11-2 

116 

117 

118 

119 

119 

120 

121 

o 

£ 

11-3 

116 

116 

117 

118 

119 

119 

120 

o 

(V* 

11-4 

115 

115 

116 

117 

118 

118 

119 

55 

o 

11-5 

114 

115 

115 

116 

117 

118 

118 

11-6 

113 

114 

114 

115 

116 

117 

117 

This  procedure  for  calculating  I.  Q.s  will  save  a great  deal 
of  time  when  a large  number  have  to  be  calculated  and 
should  be  more  accurate  because  it  eliminates  one  step  in 
the  process,  thus  reducing  the  chances  for  errors  by  that 
rmount.  If  several  different  tests  are  used  in  the  school  these 
figures  can  either  be  written  above  each  other  in  different 
colored  inks  or  a different  set  of  tables  used  for  each  test. 

Summary.  There  is  disagreement  concerning  what  age 
should  be  used  as  a maximum  denominator  in  computing  the 
I.  Q.s  for  older  students.  Some  writers  say  14  and  some  say 
16  should  be  used.  Evidence  is  available  which  seems  to  dis- 
credit 16,  but  until  further  research  is  conducted,  the  problem 
will  not  be  solved.  Each  system  must  decide  which  maximum 

51.  Another  type  of  difficulty  might  arise.  This  case  would  be  where  a 
score  of  98  is  equivalent  to  a mental  age  of,  say  14-2  and  a score  of  99  is 
equivalent  to  14-5.  In  this  case  the  mental  ages  14-3  and  14-4  would  not  be 
used  at  all. 


71 


TESTING  PROGRAMS 


chronological  age  is  to  be  used.  It  might  seem  advisable  to 
compromise  and  use  age  15. 

A method  of  fiuring  L Q.s,  through  the  use  of  Inglis  Intelli- 
gence Quotient  Values,  has  been  proposed  which  will  elimi- 
nate one  step  in  the  process  as  it  is  usually  followed. 

12.  How  Should  Intelligence  Test  Results  Be  Recorded? 

The  thesis  that  a mental  rating  should  be  available  for  each, 
pupil  was  developed  in  the  early  part  of  this  chapter  and  in 
addition  it  seemed  necessary  that  such  a rating  be  obtained 
by  testings  spaced  at  intervals  of  two  or  three  years  apart. 
The  effective  operation  of  these  principles  carries  with  it  the 
implication  that  test  records  should  be  cumulative. 

One  of  the  main  reasons  for  giving  tests  is  to  improve  the 
judgments  of  people  who  are  dealing  with  students.  Thus 
test  records  are  not  of  great  value  when  considered  separately 
but  do  become  valuable  as  supplements  of  other  data  con- 
cerning the  student.  The  implication  is  that  test  records 
should  be  incorporated  in  the  total  record  of  the  child. 

Symonds  has  emphasized  these  two  theses  by  his  state- 
ment, 

Any  comprehensively  conceived  testing  program  carries  with  it  the  need  for 
a system  of  permanent,  cumulative  records.  Much  of  the  value  of  testing  is 
lost  because  the  significance  of  the  results  are  never  seen  as  a whole.  In  the 
first  place,  the  test  results  for  a pupil  gathered  in  any  one  year  by  teachers, 
counselors,  and  physicians  need  to  be  brought  together  so  that  a picture  of  the 
whole  child  is  presented.  In  the  second  place,  these  records  need  to  be  cumu- 
lated yearly  in  order  to  show  development.  We  lose  in  trying  to  interpret  a boy 
or  girl  by  a mere  cross-section;  a longitudinal  record  of  development  is  needed 
as  well”  (52). 

A third  consideration  in  planning  records  would  be  that 
such  records  should  be  made  available  to  those  who  are  to 
use  them.  This  requires  careful  planning  in  order  to  accomp- 
lish it  as  economically  as  possible.  This  principle  is  obvious 
and  requires  no  elaboration. 

With  these  three  principles  in  mind,  we  are  better  able  to 
study  the  practices  of  the  schools.  The  most  common  practice 
of  recording  intelligence  results  is  to  place  the  results  only 
on  the  pupil’s  permanent  record  card.  This  is  done  in  30%  of 
the  total  cases.  This  practice  is  more  common  in  the  senior 

52.  Symonds,  P.  M.  “The  Testing  Program  for  the  High  School.”  School 
Review,  40:106,  February,  1932. 


INTELLIGENCE  TESTING 


73 


high  schools  (35.8^)  than  it  is  in  the  junior  high  schools 
t(21.5%)  or  in  the  six  year  schools  (28.4%).  If  the  permanent 
record  card  is  used  in  combination  with  a special  test  card,  or 
sheets,  or  filed  tests,  the  per  cent  for  all  secondary  schools 
increases  from  30%  to  62  A 

The  next  most  common  practice  is  to  file  the  results  on  a 
special  test  record  card;  13%  of  the  schools  use  this  plan. 
Another  plan  is  to  keep  the  results  on  sheets,  7%  following 
this  latter  procedure  exclusively.  This  practice  of  using  sheets 
has  some  advantages,  especially  when  they  are  arrangd  by 
home-rooms,  but  it  would  seem  advisable  also  to  have  the 
data  on  the  pupil’s  permanent  record  card.  Eighteen  per  cent 
■of  the  schools  use  some  other  method  or  combination  of  meth- 
ods. 

There  are  several  weaknesses  which  seem  apparent  in 
studying  these  figures  and  practices.  The  first  is  that  only 
three-fifths  (62<7c)  of  the  high  schools  provide  for  both  “cumu- 
lativeness and  wholeness”  in  their  record  system.  Another 
outstanding  weakness  that  is  not  brought  out  in  the  figures 
is  the  large  number  of  schools  which  seemed  to  record  the 
results  in  a variety  of  ways.  From  the  returns  of  some  schools 
one  wonders  if  they  did  anything  else  besides  recording  test 
data. 

Most  schools  could  follow  the  suggested  procedure  of  rec- 
ording intelligence  test  data  in  a permanent  and  unified  sys- 
tem. Secondary  schools  for  the  most  part  provide  for  a cumu- 
lative scholastic  record  (53)  and  the  necessary  data  could 
be  recorded  on  this  form.  Whenever  possible  it  is  desired  to 
have  at  least  a brief  summary  of  the  intelligence  test  results 
from  the  elementary  school. 

The  other  phase  of  the  question  of — How  intelligence  test 
results  should  be  recorded? — deals  with  the  items  which  need 
to  be  placed  on  the  record.  From  a study  of  the  material  pre- 
sented in  the  other  sections  of  this  chapter,  the  minimum 
for  recording  includes  1.  the  month  and  year  in  which  the 
test  was  given,  2.  the  name  of  the  test,  3.  the  form  of  the  test, 
4.  the  raw  score,  and  5.  the  I.  Q.  This  information  is  abso- 
lutely essential  for  the  proper  interpretation  of  results.  Other 

53.  Koos,  L~  V.  and  Kefauver,  G.  N.  Guidance  in  Secondary  Schools.  New 
York:  The  Macmillan  Company,  19o2,  Figure  24,  p.  2 J. 


74 


TESTING  PROGRAMS 


information,  such  as  chronological  age,  mental  age  or  percent- 
ile rank,  is  helpful  but  only  necessary  if  the  school  interprets 
the  results  in  that  form. 

Whether  or  not  a special  test  card  shall  be  used  depends 
upon  the  system  of  records  in  use  in  the  school.  There  is  some 
value  in  using  such  a card  when  the  packet  folder  system  is 
in  use  or  when  all  records  are  placed  in  one  folder,  but  test 
records  should  not  be  divorced  from  the  other  records  of 
the  pupil. 

Summary.  Intelligence  test  records  should  be  cumulative, 
should  be  included  with  the  other  records  of  the  pupils,  and 
should  be  as  economically  recorded  as  possible.  Many  schools 
at  present  do  not  make  such  provisions  for  their  records.  The 
month  and  year  the  test  was  given,  the  name  and  form  of  the 
test,  the  raw  score  and  I.  Q.  constitute  a minimum  of  informa- 
tion to  be  recorded. 

13.  To  Whom  Should  the  Intelligence  Test  Results  Be 
Available? 

The  question  of  who  should  have  the  results  of  individual 
intelligence  tests  is  another  one  of  the  probems  which  has 
progressed  no  further  than  the  discussional  stage.  There  are 
several  possibilities  in  making  available  such  results.  They 
may  be  kept  strictly  guarded,  and  used  only  by  the  adminis- 
tration or  counselor,  or  they  may  be  given  to  some  or  all 
teachers.  There  is  also  the  possibility  of  informing  pupils  and 
parents  of  the  results,  either  in  every  case  or  in  special  cir- 
cumstances. 

Writers  in  measurements  are  largely  of  the  opinion  that 
the  results  of  intelligence  tests  should  be  made  available  to 
teachers.  Hildreth  recommends  that:  “The  most  practicable 
scheme  is  that  of  expressing  I.  Q.s  in  terms  of  letter  ratings 
and  disclosing  them  to  teachers  when  such  information  is 
furnished  by  a reliable  test  and  when  teachers  are  handicapped 
for  lack  of  this  knowledge  in  working  with  the  child”  (54). 

In  considering  intelligence  test  results  on  the  secondary 
level  Ruch  and  Stoddard  recommend  that:  “Permanent  rec- 
ords should  be  kept  and  made  available  to  all  instructors. 

54.  Hildreth,  Gertrude  H.  Psychological  Service  for  School  Problems.  Yonkers- 
on-Hudson,  New  York:  World  Book  Company,  1930,  p.  74. 


INTELLIGENCE  TESTING 


They  should  he  made  available  to  the  students  only  when  it 
appears  that  such  knowledge  will  be  of  real  help”  (55). 

Practically  all  teachers  would  like  to  have  intelligence  test 
results  on  all  their  pupils.  This  was  ascertained  by  studying 
the  replies  of  over  1,000  teachers  from  some  70  of  the  schools 
included  in  this  study.  If  such  results  are  made  available  to 
teachers,  the  responsibility  for  seeing  that  teachers  under- 
stand the  significance  of  these  data  and  use  them  to  aid  the 
pupil,  rests  with  the  administration. 

The  practice  of  informing  pupils  of  the  results  of  their  in- 
telligence tests  has  been  studied  by  Allen  (56).  He  investigated 
the  problem  by  using  material  from  the  field  of  opinion,  prac- 
tical experience  and  experimentation.  His  findings  can  be 
enumerated  as  follows  (57): 

1.  The  majority  of  school  systems  did  not  give  results  of  intelligence  tests 
to  students. 

2.  The  practice  of  supplying  intelligence  test  results  to  students  is  much 
more  frequent  on  the  college  level. 

3.  Superintendents  and  directors  of  research  in  school  systems  are  largely 
opposed  to  giving  out  such  results. 

4.  The  majority  of  teachers  do  not  favor  giving  such  results  to  high  school 
or  elementary  pupils. 

5.  A minority  of  these  teachers  stated  that  they  had  knowledge  of  injurious 
effects  on  people  from  such  information. 

6.  Parents  did  not  attach  “any  more  meaning  to  standardized  test  results 
than  they  did  to  teachers’  marks  or  opinions.” 

7.  Experiments  designed  to  discover  what  objective  effects  are  produced 
in  second  testings  due  to  a knowledge  of  previous  scores  showed  no  signifi- 
cant group  effects. 

One  of  the  most  radical  proposals  for  making  results  avail- 
able to  parents  comes  from  Darsie(58).  He  suggests  a form 
which  would  report  the  child’s  intelligence  rank,  advice  as 
to  further  education  and  vocational  advice.  The  latter  was 
limited  to  suggesting  that  the  pupil  should  enter  either  the 
professional  field,  commercial-technical  occupations  or  the 
skilled  trades.  This  proposal  seems  to  contain  far  too  many 
hazards  to  put  into  operation,  at  least  at  present. 

55.  Ruch,  G.  M.  and  Stoddard,  G.  D.  Tests  and  Measurements  in  High  School 
Instruction.  Yonkers- on-Hudson,  New  York:  World  Book  Company,  1927,  p.  212. 

56.  Allen,  Clinton  M.  Some  Effects  Produced  in  an  Individual  by  Knowledge 
of  His  Own.  Intellectual  Level.  New  York:  Teachers  College,  Columbia  Uni- 
versity, Contributions  to  Education,  No.  401.  1930,  98  pp. 

57.  Summarized  from  Ibid.  Chapter  VI,  “Summary  and  Conclusions,”  p.  90-98. 

58.  Darsie,  Marvin  L.  “A  Method  of  Reporting  the  Significance  of  Intelli- 
gence Tests  to  Parents  and  Teachers.”  School  and  Society,  22:597-600,  Novem- 
ber 7,  1925. 


76 


TESTING  PROGRAMS 


The  previous  material  is  representative  of  opinion  as  to 
what  practice  should  be.  The  actual  practice  is  well  represent- 
ed by  the  411  secondary  schools  which  supplied  information 
to  whom  they  made  intelligence  test  results  available.  Such 
test  results  are  available  to  all  the  teachers  in  nine-tenths 
(91%)  of  the  schools,  to  only  (59)  classroom  teachers  in  5%, 
to  home  room  teachers  only  in  1%,  to  no  teachers  at  all  in  3%, 
to  all  parents  in  3%,  to  parents  in  special  cases  in  38%,  to  all 
pupils  in  1%  and  to  pupils  in  special  cases  11%  of  the  schools. 
There  are  several  rather  surprising  facts  which  come  to  light 
through  this  analysis;  first,  37c  of  schools  make  results  avail- 
able to  all  parents,  second,  that  such  a large  per  cent  (38%) 
of  schools  make  intelligence  test  results  available  to  parents 
in  special  cases,  third,  that  3%  of  the  schools  refuse  to  make 
results  available  to  any  of  the  teachers,  and  fourth,  that  re- 
sults are  available  to  some  or  all  the  teachers  in  97%  of  the 
schools.  This  latter  fact  seems  to  be  an  indication  that  intelli- 
gence tests  are  being  given  for  some  actual  use  in  the  schools, 
not  merely  for  filing. 

There  are  several  interesting  comparisons  in  practice  among 
the  various  types  of  schools.  Results  are  available  to  all  teach- 
ers in  887c  of  the  senior  high  schools,  91%  of  the  junior  high 
schools,  and  97 % of  the  six  year  schools.  The  senior  high 
school  tends  slightly  to  be  the  most  conservative  in  this  re- 
gard. Surprising  in  light  of  the  previous  figures  is  the  fact  that 
5%  of  the  senior  high  schools  make  results  available  to  all 
parents,  while  in  the  junior  high  schools  the  per  cent  is  3, 
and  in  six  year  schools  it  drops  to  1.1. 

Considering  the  various  combinations  of  people  having  ac- 
cess to  the  results,  the  most  popular  plan  was  to  make  them 
available  only  to  all  the  teachers  (followed  by  about  half  the 
schools).  The  second  most  popular  plan  was  to  make  results 
available  to  all  teachers  and  to  parents  in  special  cases.  This 
scheme  was  followed  in  slightly  more  than  a fourth  of  the 
schools. 

Summary.  Opinion  is  varied  as  to  who  should  be  informed 
concerning  the  results  of  the  intelligence  tests.  The  majority 
seem  to  favor  the  policy  of  furnishing  teachers  with  the  rela- 
tive brightness  of  their  pupils,  either  in  the  form  of  I.  Q.s  or 


50.  The  word  “only”  is  meant  to  exclude  other  teachers,  not  parents  or 
pupils. 


INTELLIGENCE  TESTING 


77 


letter  ratings.  Practically  all  teachers  desire  this  information 
on  their  pupils.  It  also  seems  to  be  the  concensus  of  opinion 
that  administrators  should  proceed  slowly  and  cautiously  in 
informing  pupils  or  parents  of  the  results  of  such  tests. 

Practice  seems  to  conform  relatively  well  to  the  opinions 
which  have  been  expressed.  About  97%  of  the  schools  make 
results  available  to  some  or  all  of  their  teachers  (60)  while 
only  3%  inform  all  the  parents  and  1%  inform  all  the  pupils. 

14.  What  Uses  Are  Made  of  Intelligence  Tests? 

The  list  of  uses  of  intelligence  tests  consists  of  the  uses 
checked  by  the  493  administrators  and  the  1600  teachers,  the 
administrators’  list  and  teachers’  list  being  given  separately. 
The  per  cents  which  are  given  following  each  use  refer  to  the 
per  cent  of  total  number  returning  the  check  lists.  If  56%  as 
in  the  first  use,  it  means  that  56r7°  of  the  schools  use  intelli- 
gence tests  in  forming  ability  groups.  Using  the  total  number, 
493  for  administrators  and  1600  for  teachers,  as  the  base 
makes  possible  a comparison  of  the  relative  frequency  of  use 
either  on  various  tests  or  various  items. 

The  administrators  use  intelligence  tests  principally: 

1.  To  help  in  forming  ability  groups  within  a grade.  (65%) 

2.  To  help  in  determining  whether  a pupil  is  working  up  to  his  or  her 
capacity.  (55%) 

3.  To  aid  in  determining  which  pupils  are  capable  of  doing  exceptional 
work.  (52%) 

4.  To  aid  in  studying  and  advising  failing  pupil's.  (50%) 

5.  To  furnish  information  concerning  the  probable  success  a pupill  will  have 
in  a certain  curriculum.  (44%) 

6.  To  furnish  an  estimate  of  the  pupil's  probable  success  in  college.  (42%) 

7.  To  aid  in  determining  whether  pupils  should  be  recommended  to  college 
or  university.  (38) 

8.  To  bring  about  a better  understanding  of  the  capabilities  of  the  pupil 
when  discussing  educational  or  vocational  plans  of  the  pupils  with  the 
the  parents.  (38%) 

9.  To  furnish  information  concerning  the  probable  success  of  a pupil  in  a 
certain  subject.  (29%) 

60.  When  principal’s  say  that  they  make  results  available  to  all  their  teachers, 
it  does  not  necessarily  indicate  that  it  is  made  available  to  them  in  a form 
that  is  easily  used  or  that  all  the  teachers  are  aware  that  they  can  obtain 
such  results.  In  some  70  of  these  school's  the  teachers  were  asked  if  results 
were  available  to  them.  They  usually  answered  that  such  results  w'ere  avail- 
able, but  the  number  replying  that  results  were  to  be  had  only  on  special  cases, 
or  not  at  all.  was  sufficient  to  show  a weakness  in  the  way  such  results  were 
made  accessible  to  the  teachers.  Other  replies  indicated  that  results  were  avail- 
able but  not  easily  obtainable. 


73 


TESTING  PROGRAMS 


10.  To  furnish  an  estimate  of  the  pupil’s  probable  success  in  hi_.Ii  school.  (22%) 

11.  To  compare  the  standing  of  the  school  with  the  test  norms.  (20%) 

12.  To  satisfy  parents  that  their  children  have  been  marked  fairly.  (15%) 

13.  To  aid  in  determining  the  promotion  of  pupils  from  one  grade  to  another. 
(.14%) 

14.  To  help  in  forming  the  ability  groups  within  the  room.  (12%) 

The  teachers  use  intelligence  tests: 

1.  To  enable  the  teacher  to  tell  whether  poor  work'  is  due  to  lack  of  ability 
or  to  other  factors  which  can  be  corrected.  (13%) 

2.  To  aid  in  discovering  which  pupils  are  capable  of  doing  excepional  work. 

(10%) 

3.  To  aid  in  studying  and  advising  failing  pupils.  (8%) 

4.  To  furnish  an  estimate  of  the  pupil's  probable  success  in  college.  (8%) 

5.  To  furnish  an  estimate  of  the  pupil’s  probable  success  in  high  school'.  (7%) 

6.  To  bring  about  a better  understanding  of  the  capabilities  of  the  pupil 
when  discussing  educational  or  vocational  plans  of  the  pupils  with  the 
parents.  (7%) 

7.  To  form  ability  groups  within  the  room.  (6%) 

8.  To  furnish  information  concerning  the  probable  success  a pupil  will  have 
in  a certain  curriculum.  (6%) 

9.  To  furnish  information  concerning  the  probable  success  of  a pupil  in  a 
certain  subject.  (4%) 

The  outstanding  fact  about  the  uses  of  intelligence  tests 
is  that  teachers  make  relatively  little  use  of  them.  The  largest 
per  cent  of  teachers  making  use  of  intelligence  tests  on  any 
item  is  thirteen.  Compare  that  per  cent  with  fifty-six  per 
cent  from  the  administrators’  list  and  it  is  less  than  one-fourth 
as  much.  These  data  would  seem  to  indicate  that  there  is  much 
which  needs  to  be  done  to  encourage  teachers  to  use  intelli- 
gence test  results. 


SELECTED  REFERENCES 

Book,  William  F.  The  Intelligence  of  High  School  Seniors.  New  York:  The 

Macmillan  Company,  1922,  371  pp. 

Reports  a survey  of  the  intelligence  of  high  school  seniors  in  Indiana  high 
schools. 

Buckingham,  B.  R.  Research  for  Teachers.  New  York:  Silver,  Burdett  and 
Company,  1926,  Chapter  III. 

Brief  but  helpful  discussion  of  intelligence  tests. 

Dearborn,  Walter  F.  Intelligence  Tests.  Boston:  Houghton  Mifflin  Company, 
1928,  336  pp. 

Devoted  entirely  to  treating  intelligence  tests. 

Freeman,  Frank  N.  Mental  Tests.  Boston:  Houghton  Mifflin  Company,  1926, 
503  pp. 

A most  comprehensive  treatment  of  the  history,  principles,  and  applications 
of  mental  tests. 


INTELLIGENCE  TESTING 


79 


Jones,  Arthur  J.  [Principles  of  Guidance.  New  York:  McGraw-Hill  Book  Com- 
pany, 1930,  Chapter  IX. 

Discsses  the  use  of  tests  in  studying  the  individual,  with  emphasis  on 
intelligence  tests. 

Kelley,  Truman  Lee  Interpretation  of  Educational  Measurements.  Yonkers-on- 
Hudson,  New  York:  World  Book  Company,  1927,  Chapter  VIII. 

Presents  evidence  on  the  community  of  function  between  achievement 
and  intelligence  measures. 

Koos,  L.  V.  and  Kefauver,  G.  N.  Guidance  in  Secondary  Schools.  New  York: 
The  Macmillan  Company,  1932,  Chapter  X. 

An  excellent  discussion  of  the  value  of  mental  tests  in  guidance. 

National  Society  for  the  Study  of  Education.  “Intelligence  Tests  and  Their 
Use. ’’Twenty-First  Yearbook.  Bloomington,  Illinois:  Public  School  Publishing 
Company,  1923,  275  pp. 

Part  I deals  with  general'  principles  of  intelligence  testing  while  Part  II 
deals  with  the  administrative  use  of  such  tests. 

Odell,  C.  W.  Educational  Measurement  in  High  School.  New  York:  The  Century 
Company,  1930,  Chapter  XV. 

Pintner,  Rudolph  Intelligence  Testing.  New  Edition,  New  York:  Henry  Holt 
and  Company,  1931,  555  pp. 

A most  comprehensive  treatment  of  intelligence  testing. 

Ruch,  G.  M.  and  Stoddard,  G.  D.  Tests  and  Measurements  in  High  School 
Instruction.  Yonkers-on-Hudson,  New  York:  World  Book  Company,  1927. 
Chapter  XII. 

Contains  a brief  discussion  of  available  intelligence  tests. 

Symonds,  Percival  M.  Measurement  in  Secondary  Education.  New  York:  The 
Macmillan  Company,  1927,  Chapter  IV. 

Describes  the  intelligence  tests  and  methods  of  giving  them.  Chapters  XIV 
XXIV  discuss  their  use. 

Terman,  Lewis  M.  The  Measurement  of  Intelligence.  Boston:  Houghton  Mifflin 
Company,  1916,  362  pp. 

Basic  treatment  of  the  Stanford  Revision  of  the  Binet-Simon  Scale. 

Terman,  Lewis  M.  The  Intelligence  of  School  Children.  Boston:  Houghton  Mifflin 
Company,  1919,  317  pp. 

One  of  the  earlier  books  which  contributed  to  the  popularity  of  intelligence 
tests. 

Thorndike,  E.  L.  et  al'.  The  Measurement  of  Intelligence.  New  York:  Teachers 
College  Bureau  of  Publications,  Columbia  University,  1926,  616  pp. 

A technical  discussion  of  various  phases  of  intelligence  testing. 


CHAPTER  V 

THE  ADMINISTRATION  OF  STANDARDIZED 
ACHIEVEMENT  TESTS 

Introduction 

Questions  dealing  with  the  administration  of  standardized 
achievement  tests  are  considered  in  this  chapter.  Material 
which  is  available  in  the  form  of  opinion,  experimental  evi- 
dence, and  practice  is  presented  wherever  it  is  pertinent  to 
the  questions  raised.  It  is  hoped  that  a consideration  of  these 
data  will  enable  the  schools  to  better  solve  these  problems. 

The  questions  which  are  discussed  are  those  for  which 
it  seems  to  be  most  troublesome  to  find  satisfactory  answers. 
These  questions  are  listed  by  the  number  of  the  section  in 
which  they  are  presented. 

1.  What  are  the  relative  merits  of  the  standardized  achievements  tests  as 
compared  with  the  teacher-made  tests? 

2.  What  is  the  relative  emphasis  placed  on  standardized  achievement  testing 
by  the  various  subject-matter  departments? 

3.  When  should  standardized  achievement  tests  be  given? 

4.  Who  should  give  standardized  achievement  tests? 

5.  Who  should  score  the  standardized  achievement  tests? 

6.  To  whom  should  standardized  achievement  test  results  be  made  available? 

7.  How  should  standardized  achievement  test  results  be  recorded? 

8.  In  what  form  should  the  test  results  be  made  available? 

9.  What  uses  are  made  of  standardized  achievement  tests? 

1.  What  Are  the  Relative  Merits  of  the  Standardized 
Achievement  Tests  as  Compared  With  the  Teacher- 
Made  Tests? 

The  most  important  point,  in  comparing  standardized  ach- 
ievement tests  with  teacher-made  tests,  is  that  such  tests  sup- 
plement each  other  in  the  measuring  program  of  the  school. 
Each  type  has  its  specific  functions  and  the  problem  becomes 
one  of  recognizing  the  place  of  each  type  in  the  testing  of 
the  school. 


ACHIEVEMENT  TESTING 


81 


Advantages  of  each  type.  One  of  the  outstanding  advant- 
ages of  the  standardized  achievement  test  is  that  it  is  supplied 
with  norms  which  have  been  derived  from  the  results  of  a 
large  group  of  pupils.  Where  the  norms  have  been  carefully 
determined,  they  are  decidedly  an  asset,  but  their  value  in 
many  cases  has  been  somewhat  overrated.  Norms  could  be  of 
two  kinds.  One,  a large  variety  of  norms,  such  as  separate 
norms  for  large  and  small  cities,  and  rural  communities,  for 
varying  levels  of  ability,  for  various  parts  of  the  country,  for 
various  courses  of  study,  and  so  on  for  any  groups  which 
might  be  important.  A second  type  of  norm  is  that  which  has 
been  derived  upon  a large  random  population  which  is  repre- 
sentative of  conditions  throughout  the  country.  The  main  diffi- 
culties are  that  the  norms  for  a number  of  students  in  limited 
kinds  of  areas  and  that  the  norms  of  various  tests  are  not 
comparable. 

Another  advantage  of  the  standardized  tests,  which  may 
also  be  a disadvantage,  is  that  the  material  covered  in  the  test 
is  that  which  is  most  commonly  taught.  This  means  that  a 
standardized  test  has  most  possibilities  in  the  fields  in  which 
there  is  general  agreement  as  to  content.  Where  a course  diff- 
ers widely  from  the  usual  presentation,  such  tests  are  not 
appropriate.  However,  such  a test  covering  the  desired  field 
of  knowledge  has  a distinct  advantage  in  helping  the  teacher 
to  determine  whether  she  has  covered  the  field  satisfactorily. 
Where  the  desired  objectives  of  the  course  are  the  same  as 
those  expressed  in  the  test,  it  helps  prevent  an  introduction 
of  irrelevant  material  or  an  over-emphasis  on  some  portion 
to  the  detriment  of  the  remainder  of  the  course. 

A third  advantage  is  that  standardized  tests  are  usually 
made  by  experts  in  their  fields  who  have  selected  the  items 
very  carefully  and  that  the  test  thus  represents  more  care 
and  thought  than  is  usually  expended  on  teacher-made  tests. 

The  fourth  advantage  is  that  standardized  tests  have  usu- 
ally been  proven  to  be  sufficiently  reliable  as  a measuring 
instrument.  In  this  connection  it  should  be  mentioned  that 
two  forms  of  similar  difficulty  are  usually  provided,  making 
possible  either  retests,  a more  reliable  measure  ( using  both 
forms),  or  a measure  of  progress  of  the  pupils. 

A fifth  advantage  is  that  the  standardized  test  is  already 


82 


TESTING  PROGRAMS 


prepared  and  is  usually  easily  scored.  The  saving  of  time  may 
more  than  compensate  for  the  cost  of  the  tests. 

The  main  advantage  of  the  teacher-made  test  is  that  it  meas- 
ures the  material  which  has  been  covered  in  the  class.  The 
teacher  can  use  her  own  tests  to  measure  all  phases  of  the  pup- 
ils’ achievements  which  she  desires  to  measure,  or  which  she 
is  clever  enough  to  devise  ways  of  measuring. 

The  proper  construction  of  tests  by  the  teachers  has  value 
from  the  view  point  of  the  supervisor.  This  value  lies  in  the 
teachers  considering  the  aims  and  objectives  of  the  subjects 
taught,  evaluating  the  materials  which  are  most  important 
and  considering  ways  of  measuring  these  desired  outcomes. 
The  cooperative  development  of  teacher-made  tests  offers  a 
most  useful  device  for  stimulating  interest  on  the  part  of  tea- 
chers in  the  curriculum. 

Teacher-made  tests  can  be  of  two  types,  the  so-called  new- 
type  or  objective  tests  and  the  essay  or  traditional  tests.  The 
advantages  and  disp^v  tages  of  each  type  have  been  dis- 
cussed at  length  in  the  literature  and  are  briefly  summarized 
in  the  next  chapter. 

Experimental  evidence.  There  is  a considerable  body  of 
experimental  literature  dealing  with  the  relative  merits  of 
standardized  achievement  tests  and  teacher-made  tests.  There 
is  also  much  more  material  on  the  merits  of  the  various  kinds 
of  teacher-made  tests,  especially  those  of  essay  or  objective 
tests.  This  latter  problem  is  discussed  somewhat  at  length  in 
the  next  chapter. 

The  main  method  of  evaluation  has  been  by  studying  the 
validity  and  reliability  of  the  standardized  as  compared  with 
the  non-standardized  test.  A summary  of  the  studies  on  the 
relative  validity  of  standardized  achievement  tests  as  com- 
pared to  objective  tests,  seems  to  indicate  that  with  tool  sub- 
jects, standardized  achievement  tests  were  more  valid.  How- 
ever, “where  tests  are  not  cn  tool  subjects,  a test  based  on 
the  subject  matter  taught  is  more  valid  than  a standardized 
test  not  directly  based  on  material  covered”  (1). 

Studies  on  the  reliability  have  shown  that  the  standardized 
tests  were  usually  more  reliable  than  were  teacher-made 

1.  Lee,  J.  Murray  and  Symonds,  Percivai'  M.  “New- Tyne  or  Objective 
Tests:  A Summary  of  Recent  Investigations.”  Journal  of  Educational  Psy- 
chology, 24:21-38,  January,  1933. 


ACHIEVEMENT  TESTING 


83 


tests (2).  Most  of  the  data  has  dealt  with  the  tests  when  given 
at  the  end  of  the  year.  There  is  one  investigation  on  the  rela- 
tive reliability  when  tests  are  given  in  the  middle  of  the  term. 
This  is  a study  by  Orleans  and  Symonds(3).  A teacher-made 
objective  test  and  a standardized  test  in  algebra  were  given 
in  the  middle  of  the  term.  It  was  found  that  the  reliability  of 
the  standardized  test  was  less  when  given  in  the  middle  of  the 
term  than  when  given  at  the  end.  (.72  compared  with  .89). 
The  teacher-maue  test  showed  slightly  higher  reliability  (.74 
compared  with  .72)  than  did  the  standardized  test.  In  inter- 
preting this  last  statement,  the  fact  that  slightly  fewer  items 
were  included  in  the  standardized  test  than  in  the  teacher- 
made  (based  on  median  scores  obtained)  should  be  consid- 
ered. Also  it  should  be  noted  that  Orleans  was  probably 
author  of  the  teacher-made  tests  and  is  co-author  of  the  stand- 
ardized test  used.  What  the  study  shows  is  that  standardized 
tests  do  not  have  as  high  a reliability  when  given  in  the  middle 
of  the  term  as  at  the  end.  Also  that  experts  can  make  just 
as  reliable  non-standardized  tests  as  they  can  standardized. 
This  last  statement  seems  to  indicate  that  the  expertness  of 
the  test  maker  is  what  will  determine  the  reliability  of  the  test, 
not  the  fact  of  whether  it  is  standardized  or  not. 

Preferences  of  Teachers.  Some  1600  secondary  school  tea- 
chers were  asked  to  check  the  statement — “I  usually  prefer 
standardized  achievement  tests  to  my  self-made  test”— if 
they  agreed  with  it.  If  they  disagreed  they  were  to  write  “no” 
in  front  of  it.  Twenty  per  cent  of  the  teachers  indicated  that 
they  preferred  standardized  achievement  tests  while  thirty 
per  cent  wrote  “no"  by  the  statement  and  fifty  per  cent  did 
not  answer.  These  data  seem  to  indicate  that  about  three 
teachers  out  of  five  of  those  replying  prefer  their  own  self- 
made  tests.  The  favorable  vote  outweighed  the  unfavorable 
only  in  the  commercial  department  and  there  it  was  nearly 
three  to  one  for  the  standardized  tests.  This  difference  in 
the  commercial  department  can  perhaps  be  accounted  for 

2.  For  a summary  of  data  see  Ruch,  G.  M.  The  Objective  or  New-Type 
Examination.  Chicago:  Scott  Foresman  and  Company,  1929,  478  pp. 

3.  Orleans,  Joseph  B.  and  Symonds,  Percival  M.  “The  Comparative  Relia- 
bilities of  Standardized  and  Teacher-made  Achievement  Tests  when  given  in 
the  Middle  of  the  Year.”  Journal  of  Educational  Research,  25:127-128,  Febru- 
ary, 1932. 


84 


TESTING  PROGRAMS 


by  the  policy  of  publishers  to  provide  free  tests  based  on  the 
course  of  study. 

Differences  in  use  by  teachers.  A list  of  some  19  different 
uses  was  submitted  to  the  teachers  mentioned  in  the  previous 
paragraph  and  they  were  asked  to  check  the  uses  which 
they  made  of  the  different  types  of  tests.  The  number  check- 
ing each  use  was  turned  into  a per  cent  by  dividing  by  1600. 
This  gives  the  per  cent  of  teachers  who  use  tests  for  the  pur- 
pose indicated.  These  per  cents  are  probably  too  small,  for 
teachers  not  answering  were  included  in  the  base,  as  it  was 
not  known  whether  they  did  not  use  the  tests  or  only  neglected 
to  answer. 

The  most  frequent  use  of  teacher-made  tests  was  to  aid  in 
determining  the  pupils’  mark.  Fifty-five  per  cent  checked  this 
use  for  objective  tests  while  thirty  per  cent  checked  it  for 
essay  tests.  The  most  frequent  use  of  standardized  tests  wag 
to  compare  the  results  with  the  norms;  twenty  per  cent  of 
the  teachers  checked  such  use.  These  per  cents,  55,  30  and 
20,  give  an  idea  of  the  relative  emphasis  placed  on  the  vari- 
ous tests  in  use  in  the  high  schools.  In  general  the  standard- 
ized test  differed  from  the  others  in  its  emphasis  on  compari- 
sons, either  with  the  norms  or  with  several  classes,  while  for 
the  teacher-made  tests,  they  stressed  marking  and  diagnosis. 

Difference  in  use  by  administrators.  When  the  responses 
of  the  administrators  as  to  the  uses  of  tests  were  studied 
there  was  an  exact  agreement  with  the  teachers  as  to  the 
principal  uses.  The  standardized  achievement  tests  were  used 
most  frequently  to  compare  results  with  the  norms.  The  essay 
and  objective  tests  were  used  most  frequently  for  determining 
failures  and  promotions.  The  detailed  lists  of  uses  are  given  at 
the  end  of  the  chapter.  These  are  also  discussed  at  greater 
length  in  Chapter  VI. 

Teachers’  desire  for  comparative  results.  Some  more  in- 
formation can  be  furnished  by  studying  the  desires  of  teach- 
ers for  comparative  results  from  similar  classes  in  either  the 
school,  the  school  system  or  the  state.  Very  few  teachers 
(about  5%)  stated  that  they  definitely  did  not  want  their  class- 
es given  the  same  tests  as  were  given  the  other  classes.  Over 
a fourth  of  the  teachers  did  not  answer  the  question.  The 
majority  (67%)  wanted  their  classes  given  the  same  tests 


ACHIEVEMENT  TESTING 


85 


used  throughout  the  school.  The  interest  in  comparative  re- 
sults decreases  as  the  area  is  enlarged:  Only  40%  were  inter- 
ested in  results  from  all  classes  in  the  school  system  and  only 
20%  for  state-wide  results. 

Application  to  the  testing  program.  The  data  which  have 
been  considered  in  this  section  would  make  it  seem  advisable 
to  include  both  standardized  achievement  tests  and  teacher- 
made  tests  in  any  adequate  program  of  measurement  in  the 
secondary  schools.  The  facts  presented  show  that: 

1.  Standardized  achievement  and  teacher-made  tests  each  possessed  certain 
advantages  which  the  other  did  not. 

2.  Experimental  evidence  tends  to  indicate  that  standardized  achievement 
tests  are  more  reliable,  especially  when  given  at  the  time  the  norms  are 
provided  for,  and  that  they  are  as  valid  for  tool  subjects.  The  extent  to 
which  they  are  valid  depends  in  the  final  analysis  upon  the  amount  of 
agreement  between  the  objectives  of  the  course  and  the  objectives  meas- 
ured by  the  test. 

3.  More  teachers  prefer  their  own  self-made  tests  than  standardized  tests. 

4.  Differences  in  use  exist  between  the  two  types  of  tests  which  make  it 
desirable  to  use  both  types. 

5.  Though  teachers  usually  prefer  their  own  tests,  they  would  like  the  same 
tests  used  throughout  the  school  and  to  a lesser  degree  throughout  the 
system  and  state. 

The  remainder  of  this  chapter  is  limited  to  a discussion 
cf  the  various  problems  arising  in  the  administration  of  the 
standardized  achievement  testing  program.  The  teacher-made 
tests  are  further  discussed  in  the  following  chapter. 

2.  What  is  the  Relative  Emphasis  Placed  on  Standardized 
Achievement  Testing  by  the  Various  Subject-Matter 
Departments? 

This  problem  of  relative  emphasis  is  considered  from  two 
angles.  First,  the  total  amount  of  standardized  testing  done 
by  each  department  is  studied  to  find  out  where  the  most  test- 
ing is  being  done.  Second,  the  median  number  of  standardized 
tests  as  compared  with  the  median  number  of  teacher-made 
tests  is  studied  by  departments.  This  gives  an  excellent  pic- 
ture of  the  relative  amount  of  standardized  testing  which  is 
done,  as  compared  with  the  amount  of  teacher-made  testing. 
It  also  indicates  the  relative  amount  of  testing  done  by  each 
department  instead  of  the  total  amount. 

Total  amount  of  testing  by  departments.  The  fields  in 
which  standardized  achievement  tests  are  given  are  ranked 


86 


TESTING  PROGRAMS 


in  order  of  total  frequency  of  mention  in  Table  XVIII.  In 
studying  this  table,  the  number  of  tests  given  in  any  one  field 
should  not  be  considered  too  seriously,  due  to  the  large  num- 
ber of  schools  which  did  not  indicate  the  numbers  of  tests 
used.  This  was  especially  true  for  large  schools  doing  a great 
deal  of  testing.  Another  disturbing  factor  is  that  one  large 
school  could  well  overbalance  the  report.  An  example  of 
this  latter  statement  occurs  under  Personality  and  Character 
where  one  junior  high  school  reported  using  3,020  of  the 
Hughes  Rating  scales (4). 

TABLE  XVIII 

The  Number  of  Schools  Giving  Tests  in  the  Various  Fields  and 
the  Number  of  Tests  Reported  Given 


Schools  Giving  Tests 

Number  of  Tests  Given 

Number 

Per 

| 

Rank  Subject 

J 

S 

6 

T 

Cent 

Junior 

Senior 

6 year 

Total 

1.  English  

39 

77 

38 

154 

32 

j 18,213 

51,441 

20,329 

89,982 

2.  Achievement  Batteries.  .. 

57 

37 

36 

130 

26 

(23,812 

20,053 

19,845 

63,710 

3.  Mathematics  

39 

51 

19 

109 

22 

[12,526 

7,815 

11,394 

.31,735 

4.  Science  

4 

36 

10 

50 

10 

1 0 

7,174 

3,299 

10,473 

5.  Foreign  Language 

6 

28 

9 

43 

9 

| 210 

2,237 

760 

3,207 

6.  History  & Social'  Studies 

10 

22 

8 

40 

8 

| 2,192 

4,577 

1,386 

8,155 

7.  Latin  

7 

15 

7 

29 

6 

| 909 

478 

393 

1.780 

8.  Industrial  Arts  ... 

6 

10 

4 

20 

4 

| 825 

2,607 

200 

6,632 

9.  Commercial  

0 

11 

4 

15 

3 1 

0 

5,848 

560 

6,408 

10.  Home  Economics  

0 

4 

3 

7 

2 1 

0 

52 

1,200 

1,252 

11.  Personality  & 'Character 

3 

2 

0 

5 

1 

I 3,020 

422 

0 

3,442 

12.  Fine  Arts  

0 

2 

1 

3 

1 

0 

0 

0 

0 

13.  Miscellaneous  

3 

7 

4 

14 

3 

j 4,534 

17,980 

3,300 

25,814 

Total  Number  of  School's 

123 

276 

94 

493 

Total  Number  of  Tests 

| 

66,241  (120,684 

62.666 

249,5”! 

It  is  significant  to  note  that  some  249,591  standardized  ach- 
ievement tests  were  reported  as  given  as  compared  to  nearly 
200,000  group  intelligence  tests.  This  would  seem  to  indi- 
cate that  there  are  nearly  as  many  intelligence  tests  used  as 
there  are  achievement  tests,  which  further  leads  to  the  con- 
clusion that  the  standardized  achievement  test  has  not  at- 
tained the  same  relative  popularity  in  the  high  school  as  has 

4.  Hughes,  W.  H.  “A  Rating  Scale  for  Individual  Capacities,  Attitudes  and 
Interests.”  Journal  of  Educational  Method,  3:58-65,  October,  1923. 


ACHIEVEMENT  TESTING 


87 


the  intelligence  test.  These  figures  represent  the  testing  re- 
ported from  493  secondary  schools. 

It  is  clearly  seen  that  by  either  criterion  English  ranks  first 
and  mathematics  ranks  second  in  the  amount  of  standardized 
testing  done,  among  the  various  departments.  Since  there 
are  more  tests  available  in  English (5)  this  is  not  so  surpris- 
ing. However,  mathematics  ranks  fourth  in  the  number  of 
available  tests,  yet  second  in  the  amount  given.  The  empha- 
sis in  the  junior  high  schools’  program  of  standardized  ach- 
ievement testing  seems  to  be  in  the  use  of  achievement  batter- 
ies. These  batteries  usually  test  such  subjects  as  Reading, 
Arithmetic,  Spelling,  Language  Usage,  History,  Literature, 
Geography,  Elementary  Science,  and  Health.  The  outstand- 
ing tests  now  available  are  the  New  Stanford  Achievement 
Test (6),  Modern  School  Achievement  Test(7),  and  the  Pub- 
lic School  Achievement  Test (8).* *  The  separate  subjects  receiv- 
ing most  attention  are  English  and  Mathematics. 

The  senior  high  school  testing  program  stresses  English 
to  the  extent  that  about  44%  of  the  tests  used  are  English 
tests.  Mathematiics  and  science  are  about  equally  popular  as 
far  as  numbers  given  are  concerned.  Mathematics  had  the 
slight  edge  considering  the  number  of  schools  giving.  Using 
both  the  frequency  of  schools  giving  and  number  given,  mod- 
ern foreign  language,  Latin,  history  and  the  social  studies, 
and  commercial  subjects  all  fall  in  a third  lower  group.  The 
least  standardized  testing  is  done  in  industrial  arts,  home 
economics,  fine  arts,  and  physical  education.  The  six  year 
schools’  program  is  rather  similar  to  the  senior  high  school. 
Slightly  more  emphasis  is  placed  on  mathematics  and  home 
economics. 

Relative  amount  of  testing  done  by  departments.  The  rela- 
tive amount  of  standardized  as  compared  to  teacher-made 
testing  is  given  in  Table  XIX.  In  addition  to  the  median  num- 

5.  See  Table  II,  p.  20. 

6.  Published  by  the  World  Book  Company,  Yonkers-on-Hudson,  New  York. 

7.  Published  by  the  Teachers  -College  Bureau  of  Publications,  Columbia 
University,  New  York. 

8.  Published  by  the  Public  School'  Publishing  Company,  Bloomington,  Illin- 
ois. 

* Note — Recently  the  Progressive  Achievement  Tests  have  been  published  by 
the  Southern  California  School  Book  Depository,  Hollywood  and  the  Metro- 
politan Achievement  Tests  by  the  World  Book)  Company. 


88 


TESTING  PROGRAMS 


TABLE  XIX 


Median  Number  of  Standardized  Achievement  Tests  and 
Teacher-Made  Tests  Given  During  One  Semester 
Ending  February  1932 


1 

Department 

J 

Standardized  [ 
Median  | No.  of  | 
[Teachers 

Teacher- Made 
Median  | No.  of 

1 Teachers 

Ratio  of  Teacher- 
made  to  St’ndardized 

■Commercial  

2.4 

102 

95 

125 

4.1 

Science  

5 

164 

9.6 

179 

12.0 

Mathematics  

1.0 

192 

8.5 

226 

8.5 

Social  Studies  

.7 

96 

85 

117 

11.9 

Latin  

-8 

36 

8.0 

54 

10.0 

English  

1.2 

284 

7.0 

324 

5.8 

Mod.  For.  Lang.  

5 

79 

65 

92 

8.5 

History  ... 

.7 

105 

6.0 

117 

8.6 

Home  Economics 

.6 

60 

6.0 

76 

10.0 

Industrial'  Arts 

.6 

82 

4.6 

[ 103 

# 

Phys.  Ed 

.6 

18 

4.0 

28 

Fine  Arts  | 

.6 

26 

3.4 

46 

Total  

.9 

1294 

7.2 

1487 

8.0 

*Ratios  below  this  point  not  significant  due  to  the  fact  that  it  is  impossible 
for  the  median  of  standardized  tests  to  decrease  below  .5. 


ber  of  each  given  in  one  semester,  the  ratio  of  the  teacher- 
made  to  the  standardized  is  given  in  the  last  column. 

The  fact  that  these  data  were  obtained  from  the  teacher 
for  the  first  semester  probably  meant  that  fewer  standardized 
tests  were  given  than  would  have  been  the  case  had  records 
from  the  second  semester  been  obtained.  Though  the  English 
department  gave  the  largest  number  of  standardized  tests, 
it  can  be  seen  from  Table  XIX  that  the  commercial  depart- 
ment uses  the  largest  number  of  such  tests  per  teacher  (2.4). 
English  ranks  second  and  mathematics  third.  The  commercial 
department  not  only  gives  the  largest  number  of  standard- 
ized tests,  but  also  the  largest  number  of  teacher-made  tests. 
The  departments  are  ranked  in  order  of  the  frequency  with 
which  they  gave  the  teacher-made  tests;  comparisons  may  be 
easily  made.  The  science  department  places  more  emphasis 
on  teacher-made  tests  than  do  any  of  the  departments  (ratio 
of  12.0),  followed  by  the  social  studies  group. 

There  are  about  eight  times  as  many  teacher-made  tests 
given  as  standardized  tests.  Though  standardized  achieve- 
ment testing  is  an  important  part  in  the  secondary  school  pro- 


ACHIEVEMENT  TESTING 


89 


giam,  the  development  and  improvement  of  the  tests  that  the 
teacher  makes  is  much  more  important. 

3.  When  Should  Standardized  Achievement  Tests  Be 
Given? 

Comparatively  little  has  been  written  on  when  standard- 
ized achievement  tests  should  be  given.  The  assumption  on 
the  part  of  both  the  test  makers  and  users  has  been  that  ach- 
ievement tests  in  the  high  school  should  be  given  at  the  end 
of  the  semester  or  year.  The  norms  provided  for  most  tests 
are  for  the  end  of  the  year.  This  tendency  is  due  to  the  fact 
that  most  high  school  subjects  are  one  year  subjects  which 
have  little  relation  to  the  studies  of  other  years.  Standard- 
ized achievement  tests  which  have  been  constructed  for  use 
at  the  end  of  the  year  when  used  during  the  semester  are  not 
as  reliable  as  carefully  constructed  teacher-made  objective 
tests.  This  has  been  demonstrated  by  Orleans  and  Sym- 
onds(9).  The  difficulty  which  was  mentioned  above,  that  most 
tests  do  not  have  norms  other  than  those  for  the  end  of  the 
year,  has  been  overcome  somewhat  by  a study  by  Sym- 
onds(10).  He  has  provided  a series  of  standards  on  nine  stand- 
ardized tests  in  high  school  subjects  for  December,  March,  and 
June  first.  These  standards  are  an  indication  of  how  such 
norms  can  be  developed  but  obviously  the  authors  or  publish- 
ers are  responsible  for  them. 

To  overcome  these  objections,  unit  tests(ll)  have  been 
developed  which  attempt  to  measure  a relatively  small  part 
of  the  year’s  work.  Only  a few  of  these  provide  standards. 

9.  Orleans,  Joseph  B.  and  Symonds,  Percival  M.  op.  cit. 

10.  Symonds,  Percival  M.  Ability  Standards  for  Standardized  Achievement 
Tests  in  the  High  School.  New  York:  Teachers  Coi'ler'e  Bureau  of  Publica- 
tions, Columbia  University,  1927,  91  pp.  Provides  standards  on  th£  Hotz  Algebra, 
Schorling-Sanford  Geometry,  Van  Wagenen  American  History,  White  Latin. 
Ruch-Crossman,  Biology,  Ruch-Popenoe  General  Science,  Powers  General 
Chemistry,  Haggerty  Reading,  and  American  Council  Alpha  French  Tests. 

11.  Such  as: 

Bishop  and  Irwin’s  Instructional  Tests  in  Plane  Geometry. 

Bl'aisdell’s  Instructional  Tests  in  Biology. 

Glenn  and  Gruenberg’s  Instructional  Tests  in  General  Science. 

Glenn  and  Welton’s  Instructional  Tests  in  Chemistry. 

Glenn  and  Obourn’s  Instructional  Tests  in  Physics. 

Schorling-Clark-Lindell  Instructional  Tests  in  Algebra.  Published  by 
the  World  Book  Comnany,  Yonl’ers-on-Hudson,  New  York. 

Michigan  Instructional  Tests  in  Algebra. 

Seattle  Solid  Geometry  Tests. 

Published  by  the  Public  School  Publishing  'Company,  Bloomington, 
Illinois. 


90 


TESTING  PROGRAMS 


In  fact  most  tests  of  this  type  are  more  closely  related  to 
teacher-made  tests  than  they  are  to  standardized  tests.  Auth- 
ors of  these  unit  or  instructional  tests  are  doubtful  if  stand- 
ards or  norms  add  to  the  value  of  such  tests.  These  doubts 
are  probably  greater  than  they  otherwise  would  be  if  the 
providing  of  adequate  norms  did  not  entail  so  much  labor. 
These  instructional  or  unit  tests  do  not  furnish  the  teacher 
or  administrator  with  a means  of  comparing  class  standing 
with  the  norms.  A neglected  possibility  in  the  high  school  is 
the  use  of  standardized  achievement  tests  for  diagnosis  at 
the  beginning  of  the  year’s  work.  This  use  of  course  is  applic- 
able only  in  such  subjects  as  second,  third  or  fourth  year  lang- 
uage. English,  social  studies,  and  sciences.  The  junior  high 
school  offers  possibilities  in  mathematics,  English  and  the 
social  studies.  The  practice  of  testing  at  the  beginning  of  the 
semester  is  followed  to  a large  extent  in  the  elementary 
grades.  Stenquist,  in  describing  the  practice  in  the  elementary 
schools  of  Baltimore,  indicates: 

These  surveys  are  scheduled  at  the  beginning  of  each  term  so  that  teacher, 
principal,  supervisor,  and  superintendent  may  each  do  his  part  in  devising 
intelligent  remedial  steps”(12). 

The  main  reason  for  giving  tests  is  to  improve  the  learning 
situation.  Taking  an  inventory  of  the  skills  and  knowledge 
of  the  class  at  the  beginning  of  the  semester  makes  possible 
immediate  corrective  teaching.  One  research  director  on  his 
check  list  said  that  testing  at  the  end  of  the  semester  resembles 
too  much  the  “locking  of  the  garage  door  after  the  car  has 
been  stolen."  Another  good  reason  for  testing  at  the  begin- 
ning of  the  semester  is  that  so  many  teachers  are  apt  to  feel 
that  final  testing  is  an  attempt  to  measure  their  efficiency. 
Pre-testing  removes  ali  such  fears  and  concentrates  the  atten- 
tion of  the  teacher  on  the  needs  of  her  class,  for  obviously  a 
test  at  the  beginning  of  the  semester  cannot  possibly  measure 
the  efficiency  of  the  teacher. 

Summary.  In  most  cases  standardized  achievement  tests 
are  only  suitable  for  use  at  the  end  of  the  semester.  The  main 
difficulties  in  their  use  at  other  times  is  due  to  lack  of  suitable 
norms  and  low  reliability.  There  are  neglected  possibilities  of 

12.  Stenquist,  John  L.  “Getting  Research  into  Practice  in  a Large  School 
System.”  American  School  Board  Jouumal,  Vol.  19,  November,  1930,  p.  41-42, 
December,  1930,  p.  41-^12. 


ACHIEVEMENT  TESTING 


91 


using  such  tests  as  pre-tests  in  subjects  which  are  continuous. 
Such  use  enables  the  teacher  to  obtain  an  immediaite  under- 
standing of  the  capabilities  of  the  pupils.  Thus  time  is  not 
wasted  in  teaching  material  already  mastered  or  in  starting 
the  work  so  far  ahead  of  the  class  that  they  do  not  understnd 
it. 


4.  Who  Should  Give  Standardized  Achievement  Tests? 

There  is  general  agreement  that  the  teacher  should  be  the 
one  to  give  the  standardized  achievement  tests  in  most  cases. 
Testing  the  outcomes  of  learning  is  part  of  the  total  teaching 
process  and  as  such  should  be  performed  by  the  classroom 
teacher.  It  requires  no  great  skill  to  give  standardized  tests. 
Anyone  capable  of  teaching  should  be  able  to  administer 
such  tests.  A few  precautions  need  to  be  observed: 

1.  The  teacher  should  be  thoroughly  familiar  with  the  test  and  with  the 
manual  of  directions  before  attempting  to  test. 

2.  A period  should  be  used  which  is  free  from  interruptions  and  nothing 
should  disturb  the  testing,  especially  if  the  test  is  one  which  requires 
careful'  timing.  Distracting  influences,  such  as  testing  just  before  or  after 
an  important  assembly  should  be  avoided. 

3.  Careful  preparation  should  be  made  for  the  testing.  The  number  of  tests 
should  be  verified,  extra  pencils  should  be  on  hand,  and  desks  should 
be  cleared. 

4.  Directions  for  administering  the  test  as  given  in  the  manual  should  be 
followed  verbatim.  The  directions  have  been  carefully  worked  out  by  the 
author  of  the  test  and  should  not  be  deviated  from,  except  in  the  most 
urgent  of  circumstances. 

5.  Timing  must  be  done  carefully  and  exactly.  A stop  watch  is  best  but  by 
using  some  degree  of  care  a watch  with  a second  hand  can  be  used.  The 
time  starting  each  test  should  be  carefully  noted  on  a slip  of  paper — 
“Started  9 hr.  15  min.  and  30  sec. — 10  minutes  allowed — Stop  at  9 hr.  25 
min.  and  30  sec.”  Where  the  test  is  timed,  either  on  the  various  parts  or 
on  the  whole  test,  no  variation  in  timing  is  permitted. 

6.  Pupils’  morale  should  be  kept  up  during  the  test.  They  should  not  be 
made  to  feel  that  it  is  a life  or  death  matter,  but  should  be  encouraged  to 
do  their  best.  Direct  assistance  should  not  be  given  during  the  progress  of 
the  test. 

The  administering  of  the  tests  furnishes  the  teacher  with 
a more  complete  understanding  of  the  test  than  would  other- 
wise be  possible.  It  is  obvious  that  unless  the  teacher  is  famil- 
iar with  the  test,  she  is  greatly  handicapped  in  providing  for 
remedial  teaching  on  the  basis  of  the  results.  Other  reasons 
for  teachers  doing  their  own  testing  are  economy  and  a better 
attitude  which  is  developed  toward  testing. 


&2 


TESTING  PROGRAMS 


5.  Who  Should  Score  the  Standardized  Achievement  Tests? 

Standardized  achievement  tests  were  scored  by  the  teach- 
ers in  four-fifths  (79%)  of  the  schools  reporting,  by  the  teach- 
er and  research  department  in  13%,  by  the  research  depart- 
ment alone  in  5%,  and  by  clerks  in  2%. 

There  is  some  justification  for  the  teachers  scoring  the 
achievement  tests,  for  there  is  probably  a great  deal  a teacher 
can  learn  concerning  individual  and  class  needs  when  she 
scores  her  own  test  papers.  More  of  the  writers  are  in  agree- 
ment that  it  is  more  desirable  for  the  teacher  to  correct  the 
achievement  tests  than  to  score  the  intelligence  tests. 

The  practice  of  the  Beaumont  schools  is  interesting  in  this 
connection.  In  this  system  the: 

Teachers  score  their  own  tests  and  thus  become  acquainted  with  the  weak- 
nesses of  their  own  classes  and  of  individual  pupils.  In  this  way  teachers  are 
better  able  to  determine  remedial  plans  to  be  used”  (13). 

As  far  as  the  writer  has  been  able  to  discover  this  problem 
has  been  limited  to  discussion.  What  it  really  needs  is  experi- 
mentation to  discover  the  answers  to  questions  like  these: 

1.  Do  teachers  do  more  or  better  remedial'  work  when  they  correct  standard- 
ized tests? 

2.  Do  teachers  become  better  acquainted  with  the  weaknesses  of  their  pupils 
by  scoring  the  tests? 

3.  Does  the  correcting  of  test  papers  and  the  use  of  whatever  diagnostic 
means  provided  prove  more  effective  in  terms  of  corrective  teaching  than 
the  use  of  the  diagnostic  chart  alone? 

When  the  answers  to  these  questions  are  obtained,  one  will 
be  able  to  arrive  at  a better  statement  than  is  now  possible. 

There  is  evidence  to  show  that  the  most  effective  means 
of  correcting  papers  from  the  standpoint  of  nupil  learning, 
is  to  have  each  pupil  correct  his  own  paper.  The  main  diffi- 
culty to  that  proposal  is  that  cheating  will  take  place  and 
since  the  tests  are  to  be  used  for  marking,  this  must  be  avoid- 
ed. It  has  been  suggested  elsewhere  (14)  that  this  cheating 
can  be  overcome  by  the  use  of  a duplicate  answer  sheet  which 
is  filled  out  by  the  pupil  as  the  answers  are  filled  in  on  the 
test  paper.  The  duplicate  answer  sheet  is  then  turned  in  to 

13.  ‘‘Testing  Program  in  Beaumont,  Texas.”  Journal  of  Educational  Re- 
search, 25:158-159.  February,  1932. 

14.  For  a more  complete  discussion  of  the  problem  than  can  be  given  here, 
see  Lee,  J.  Murray,  and  Symonds,  Percival,  M.  “New-Type  or  Objective 
Tests:  A Summary  of  Recent  Investigations  (October.  1931-October  1933)” 
Journal  of  Educational  Psychology,  25:  161-184,  March,  1934. 


ACHIEVEMENT  TESTING 


93 


the  teacher  with  a complete  record  of  the  pupil’s  responses 
and  any  changes  the  pupil  makes  in  correcting  can  be  noted 
when  the  papers  are  checked  by  the  teacher  after  they  have 
been  turned  in. 

The  suggested  plan  seems  to  be  nearly  as  usable  on  stand- 
ardized tests  as  on  new-type  tests.  One  caution  should  be  kept 
in  mind:  each  question  must  have  a definite  answer.  If  there 
is  any  question  about  the  correctness  of  the  answer,  the  meth- 
od would  not  work.  Another  caution  is  that  such  a method 
of  scoring  standardized  would  probably  invalidate  the  test  if 
it  were  to  be  used  again  at  some  future  time  with  the  same 
group.  Standardized  tests  seem  to  have  acquired  a certain 
sanctity  which  it  is  the  custom  to  observe.  It  does  not  seem 
that  familiarity  with  an  achievement  test  can  do  the  pupil 
much  harm.  There  is,  of  course,  the  objection  that  the  pupil 
might  be  given  the  test  again  particularly  in  subjects  continu- 
ing for  more  than  a year,  and  the  possibility  of  information 
being  given  to  other  students.  The  device  should  not  be  used 
with  achievement  batteries  such  as  the  Stanford  or  the  Sones- 
Harry  for  it  might  be  necessary  to  repeat  such  tests. 

In  a city-wide  achievement  survey  on  the  secondary  level, 
it  would  be  entirely  possible  to  print  such  answer  blanks  and 
by  following  the  plan  suggested  have  all  the  test  given,  scored, 
and  the  results  tabulated  in  one  day.  Such  speed  of  making 
the  data  available  has  much  to  commend  it. 

Summary.  Standardized  achievement  tests  are  scored  by 
the  teachers  alone  in  nearly  four-fifths  of  the  schools.  There 
seems  to  be  justification  for  this  practice  considering  that  the 
teacher  becomes  familiar  with  the  errors  made.  This  last 
statement  is  opinion  and  it  is  possible  that  research  on  this 
topic  would  show  that  comparatively  little  value  is  derived 
from  correcting  tests.  Experiments  have  indicated  that  the 
effective  method  of  correcting  test  papers,  from  the  standpoint 
of  pupil  learning,  is  to  have  the  pupils  correct  their  own 
papers.  A method  is  suggested  which  will  permit  this  and  at 
the  same  time  prevent  any  cheating.  Pupil  correction  of  stan- 
dardized achievement  tests  have  been  little  considered  in  the 
past,  but  there  would  seem  to  be  opportunities  in  this  direc- 
tion. 


£4 


TESTING  PROGRAMS 


6.  To  Whom  Should  Standardized  Achievement  Test 
Results  Be  Made  Available? 

Since  the  main  purpose  of  testing  is  to  improve  the  learn- 
ing situation,  test  results  should  be  available  to  those  who, 
through  their  knowledge  of  the  results,  will  effect  an  improve- 
ment in  the  learning.  This  criterion  will  obviously  include 
the  classroom  teacher,  for  hers  is  the  direction  of  the  learn- 
ing process.  It  will  include  the  pupil  if  it  can  be  shown  that 
a knowledge  of  test  results  improves  his  work.  The  parent 
will  also  be  added  if  knowledge  of  the  progress  of  his  child 
will  affect  the  learning  situation. 

There  seems  to  be  rather  complete  agreement  that  teachers 
should  be  furnished  with  achievement  test  data.  Hildreth  goes 
further,  pointing  out: 

The  teacher  needs  1.  knowledge  of  the  facts  about  the  capacity  and  achieve- 
ment status  of  new  pupils;  2.  diagnostic  or  at  least  analytical,  data  for  pupils 
with  peculiar  difficulties  and  irregularities  in  achievement;  3.  measures  of  the 
progress  of  pupils  over  a given  period  of  time;  4.  knowledge  of  a child’s 
achievement  as  compared  with  his  capacity;  5.  test  data  as  a basis  for  school 
experimentation.  The  possession  by  teachers  of  all  these  data  coupled  with 
proper  knowledge  as  to  their  use  in  instruction  results  in  greater  efficiency  in 
obtaining  results  commensurate  with  the  effort  expended”  (15). 

There  is  not  complete  agreement  on  whether  pupils  should 
be  given  standardized  achievement  test  results.  Using  the 
criterion  given  at  the  beginning  of  this  section  in  studying 
the  experimental  evidence  of  motivation,  it  is  possible  to  ob- 
tain an  answer.  Thorndike  has  stated  the  case  for  those  who 
are  in  favor  of  giving  results  to  pupils.  He  emphasizes  that: 

Interest  in  school  work  and  in  personal  efficiency  therein  can  gain  tremendu- 
ously  in  the  minds  of  many  pupils  by  kowledge  of  the  standings  which  they 
have  achieved  in  various  kinds  of  scholastic  tests — The  final  justification  for 
every  testing  regime  rests  in  Mary  Jones  and  John  Smith,  and  it  therefore 
behooves  all  persons  who  are  making  and  giving  tests  to  take  them  into  partner- 
ship as  soon  and  as  completely  as  is  feasible” (16). 

Turning  to  the  experimental  evidence  there  are  a number 
of  studies  in  the  field  of  motivation  which  are  suggestive  of 
the  results  of  pupils  knowing  their  own  record.  The  study 
most  directly  applicable  is  one  by  Symonds  and  Chase  on 
improvement  in  grammar  in  the  sixth  grade.  They  found: 

15.  Hildreth,  Gertrude  H.  Psychological  Service  for  School  Problems.  Yonk- 

ers-on-Hudscn,  New  York:  World  Book  Company,  1930,  n.  73. 

16.  Thorndike,  Edward  L.  “Standardized  Tests  and  Their  Use.”  Teachers 
College  Record,  26:93-116,  October,  1j24. 


ACHIEVEMENT  TESTING 


95 


Test  motivation  caused  learning  over  and  above  that  which  could  be 
explained  by  practice.  The  value  of  the  test  motivation  may  be  estimated  as 
the  equivalent  of  about  five  sheer  repetitions”  (17). 

Monroe  and  Engelhard,  after  reviewing  the  research  studies 
in  this  field,  conclude: 

The  findings  of  the  studies  referred  to  are  almost  unanimously  in  favor  of 
the  contention  that  knowledge  of  progress  of  learning  is  an  effective  stimu- 
lus” (18). 

These  findings  seem  to  indicate  that  there  is  a gain  in  learn- 
ing which  comes  from  a pupil’s  knowledge  of  his  achieve- 
ment. This  seems  to  rather  definitely  endorse  the  policy  of 
giving  the  pupils  their  results  on  standardized  achievement 
tests. 

Whether  or  not  parents  should  be  given  standardized  test 
results  is  still  a moot  question.  Among  the  main  objections  to 
the  practice  is  the  fact  that  parents  have  not  the  necessary 
knowledge  for  proper  interpretation.  There  would  seem  to  be 
no  objection  in  discussing  such  test  results  with  the  parent 
in  conference.  However,  the  present  system  of  reporting 
marks  would  seem  to  be  sufficient  in  most  cases. 

Turning  to  the  practice  of  the  schools,  standardized  achieve- 
ment test  results  are  available  to  all  the  teachers  in  over 
three-fourths  (77%)  of  the  schools,  to  classroom  teachers 
only  (19)  in  one-fifth  (21%),  to  home  room  teachers  only,  in 
0.8%,  to  no  teachers  at  all  in  1.27°.  They  are  available  to  all 
parents  in  15%  of  the  schools,  to  parents  in  special  cases  in 
26%,  to  all  pupils  in  17%,  and  to  pupils  in  special  cases  in  16% 
of  the  schools. 

It  is  difficult  to  see  what  justification  two  per  cent  of  the 
schools  have  for  not  making  the  results  available  to  classroom 
teachers.  Admitting  that  there  are  times  when  achievement 
tests  may  be  given  for  administrative  purposs,  it  would  seem 
advisable  also  to  use  the  results  for  immediate  improvement 
of  instruction  in  the  classroom. 

One  difference  in  practice  occurs  between  schools  which 

17.  Symonds,  Percival  M.  and  Case,  Doris  H.  ‘‘Practice  vs.  Motivation.  ’ 
Journal  of  Educational  Psychology,  20:19-35,  January,  1929. 

18.  Monroe,  W.  S.  and  Engelhard,  M.  D.  “Stimulating  Learning  Activity.” 
Urbana,  Illinois:  University  of  Illinois  Bulletin  Vol.  28,  No.  1,  Bureau  of  Edu- 
cational' Research  Bulletin,  No.  51,  1930,  p.  48. 

19.  The  word  “only”  is  meant  to  exclude  other  teachers,  not  parents  or 
pupils. 


TESTING  PROGRAMS 


% 

is  explainable  in  light  of  the  different  types  of  testing’  pro- 
grams. This  difference  is  in  the  number  of  schools  making; 
results  available  to  all  teachers,  done  by  two-thirds  of  the 
senior,  85%  of  the  junior  and  9 1 c of  the  six  year  high  schools. 
The  junior  and  six  year  high  schools  gave  relatively  more 
batteries  of  achievement  tests  than  did  the  senior  high  schools. 
This  fact  would  account  for  the  difference  in  practice  of  about. 
20%.  In  all  other  cases  the  practices  of  the  three  types  of 
schools  are  practically  identicaL 

The  main  difference  between  theory  and  experimentation 
and  practice  is  in  the  policy  of  informing  the  students  of  their 
results.  There  seems  to  be  good  evidence  to  indicate  that  it 
is  desirable  that  pupils  know  their  achievements,  yet  only 
one-sixth  of  the  schools  follow  that  practice.  Five-sixths  of 
the  schools  are  missing  the  opportunity  to  use  an  effective 
motivation  for  learning. 

Returning  to  report  from  Beaumont!  20)  in  the  elementary 
schools,  they  have  the  pupils  make  and  study  their  own  pro- 
files on  the  tests  often  “taking  the  record  home  to  their  par- 
ents who  are  able  to  acquaint  themselves  with  the  progress 
of  their  children.” 

Summary.  There  is  either  agreement  of  opinion  or  evi- 
dence to  indicate  that  standardized  achievement  test  results 
should  be  made  available  to  the  teachers  and  pupils.  Practi- 
cally all  schools  furnish  such  results  to  the  teachers,  but  in 
only  about  one-sixth  of  the  schools  is  that  practice  followed 
for  the  pupils.  There  seems  to  be  no  objection  to  informing 
parents  of  such  test  results,  the  only  difficulty  being  one  of 
interpretation. 

7.  How  Should  Standardized  Achievement  Test  Results 
Be  Recorded? 

Records  are  means  of  making  data  available  for  use.  Usage 
is  thus  the  determining  factor  of  what  facts  should  be  included 
in  any  record  system.  If  no  one  is  to  make  use  of  them,  there 
is  no  reason  to  spend  time  recording  facts,  however  important 
they  would  be  if  they  were  interpreted.  This  criterion  of  utility 
is  the  one  which  each  school  must  use  in  determining  the 
extent  to  which  standardized  achievement  test  results  are 
recorded.  The  ultimate  evaluation  of  any  system  of  records  is 


20.  op.  cit.,  see  footnote  13. 


ACHIEVEMENT  TESTING 


w 


(dependent  upon  the  improvements  which  the  use  of  such  rec- 
ords makes  in  the  improvement  of  the  learning  situation  and 
in  the  bettering  of  the  adjustments  of  individual  pupils. 

It  is  possible  to  state  how  standardized  achievement  tests 
should  be  recorded  if  the  school  wishes  to  use  the  results  most 
effectively.  Wood  observes; 

Examining  is,  in  its  very  nature,  a long  process  of  carefully  related  measure- 
ments and  observations  which  are  carefully  and  conveniently  recorded  and 
very  carefully  studied  ...  I find  almost  no  systematic  examining  or  record- 
ing; the  sums  of  money  and  energy  that  we  spend  supposedly  on  this  problem 
are  really  sacrificed  in  a series  of  unrelated  and  distorted  snapshots. 

The  point  is  that  an  examination  result,  even  if  accurate  for  the  moment, 
can  have  little  meaning  for  constructive  educational  guidance  unless  it  can  be 
related  to  one  or  more  comparable  measurements  of  the  same  function  taken 
previously,  and  to  comparable  measurements  of  other  functions  taken  at  the 
same  time,  as  welt  as  previou  lv”(21). 

This  statement  stresses  two  necessary  factors.  The  record 
system  must  provide  a method  which  will  make  the  data 
cumulative  and  relate  it  to  other  facts  about  the  pupil.  The 
need  for  a method  of  recording  standardized  achievement 
tests  is  thus  the  same  for  intelligence  tests. 

Methods  of  recording  results  of  standardized  achievement 
tests  do  not  seem  to  be  as  permanent  in  character  as  the  plans 
for  recording  intelligence  tests.  Only  14^°  of  the  schools  rec- 
ord results  on  the  permanent  record  card  alone;  when  the 
permanent  record  is  used  in  combination  with  some  other 
method  this  per  cent  rises  to  33%.  The  most  popular  plan 
is  for  the  individual  class  teachers  to  keep  the  results  for  their 
classes  on  sheets,  15%  of  the  schools  following  this  scheme. 
Ten  per  cent  of  the  schools  file  the  results  on  the  test  card, 
9%  on  sheets  filed  in  the  office  and  46%  use  some  other  plan 
or  combination  of  the  ones  mentioned. 

These  facts  concerning  practice  tend  to  support  the  criti- 
cisms that  a large  amount  of  energy  is  being  expended  in 
‘'a  series  of  unrelated  and  distorted  snapshots.”  Only  one-third 
of  the  schools  make  provision  for  the  unification  of  the  data 
by  recording  results  on  the  permanent  record  card.  As  for 
permanency,  only  slightly  over  one-half  (53%)  make  pro- 
vision for  it  by  either  recording  the  results  on  the  permanent 

21.  Wood,  Ben  D.  “The  Structure  and  Content  of  the  Comprehensive  Exami- 
nation for  College  Sophomores.”  Recent  Trends  in  American  College  Edu- 
cation. Chicago:  University  of  Chicago  Fress,  1"S1,  p.  1D0-207. 


98 


TESTING  PROGRAMS 


record  card  or  on  a special  record  card.  It  would  appear  that 
much  more  can  be  done  in  the  use  of  standardized  achieve- 
ment test  results  than  is  now  being  done.  Only  a third  of  the 
schools  provide  a record  system  which  facilitates  the  effect- 
ive use  of  tests.  Even  though  this  third  provide  the  records, 
it  does  not  necessarily  mean  that  they  are  actually  using  these 
records  efficiently.  One  can  easily  see  the  great  opportunity 
for  improvement. 

8.  In  What  Form  Should  the  Test  Results  Be  Made 
Available? 

When  achieivement  tests  are  corrected  and  the  points  tot- 
alled, a score  is  arrived  at.  This  score  has  little  meaning  in 
itself  and  needs  to  be  translated  in  terms  which  most  teachers 
can  understand.  Scores  are  usually  translated  into  grade 
norms  or  percentile  norms  for  most  high  school  tests. 

Grade  norms  are  the  average  scores  made  on  the  test  by 
the  pupils  in  each  grade.  Usually  these  norms  are  expressed 
in  terms  of  years  and  months,  so  a grade  norm  of  9.0  would 
be  equivalent  to  the  average  made  by  beginning  ninth  grade 
students,  and  9.1  by  the  average  of  ninth  grade  students  after 
one  month’s  instruction.  This  type  of  norm  is  only  useful 
when  the  test  covers  a span  of  several  years  such  as  reading 
tests.  It  is  for  this  reason  that  such  norms  are  not  used  more. 

Percentile  norms  are  scores  made  on  the  test  which  indi- 
cate the  per  cent  of  pupils  which  do  not  exceed  these  scores. 
Thus  a percentile  norm  of  60  indicates  that  60%  of  the  pupils 
do  not  exceed  the  equivalent  score.  Another  popular  method 
of  reporting  test  norms  is  in  terms  of  medians  and  quartiles. 
The  median  is  merely  the  50th  percentile  and  indicates  that 
50%  of  the  pupils  did  not  exceed  such  a score  and  that  50% 
did  exceed  it.  The  lower  or  first  quartile  is  the  25th  percentile 
which  indicates  that  such  a score  is  not  exceeded  by  25%  of 
the  group.  The  upper  or  third  quartile  is  the  75th  percentile. 

The  disadvantage  of  percentile  scores  is  that  eaual  units 
of  difference  do  not  indicate  equal  differences  in  ability.  There 
is  much  greater  difference  in  ability  between  a student  at 
the  95th  percentile  and  one  at  the  90th  than  there  is  between 
one  at  the  55th  and  50th. 

There  is  not  much  choice  as  to  what  translated  or  trans- 
muted score  will  be  used  under  present  conditions.  The  pub- 


ACHIEVEMENT  TESTING 


m 

Tislied  percentile  norms  can  be  taken  or  percentile  ranks  can 
be  computed  on  the  group  taking  the  test.  For  the  method 
of  figuring  such  percentile  ranks,  see  any  good  statistics  book 
or  a very  convenient  chart  is  published  (22)  which  facilitates 
the  labor. 

There  is  a great  need  for  improvement  of  the  norms  for 
high  school  tests.  There  is  no  degree  of  uniformity,  same  re- 
sults being  reported  in  terms  of  grade  scores,  some  in  per- 
centiles, and  some  in  medians  only.  The  tester  who  is  giving 
several  different  tests  is  faced  with  a real  difficulty  in  obtain- 
ing comparable  results. 

In  view  of  the  previous  discussion,  just  what  items  should 
be  recorded  on  the  permanent  record  card?  First,  (a)  the 
name  and  form  of  the  test,  (b)  date  given  in  terms  of  month 
and  year,  and  (c)  score  on  the  test.  Second,  some  transmuted 
score  is  desirable,  probably  percentile  rank. 

9.  What  Uses  are  Made  of  Standardized  Achievement 
Tests? 

The  list  of  uses  made  of  standardized  achievement  tests  is 
taken  from  the  uses  most  frequently  checked  by  administrat- 
ors and  teachers.  To  bring  out  the  differences,  there  are  sepa- 
rate lists  given  for  the  administrators  and  for  the  teachers 
in  the  same  manner  as  was  done  for  the  uses  of  intelligence 
tests. 

The  main  uses  of  the  tests  according  to  administrators  are’. 

1.  To  compare  the  standing  of  the  school  with  the  test  norms.  (58%) 

2.  To  help  in  determining  whether  a pupil  is  wording  up  to  his  or  her 
capacity.  (49%) 

3.  To  measure  progress  during  the  semester  or  year.  (42%) 

4.  To  stimulate  interest  on  the  part  of  the  teachers  in  the  improvement  of 
instruction.  (39%) 

5.  To  aid  in  studying  and  advising  failing  pupils.  (36%) 

6.  To  help  in  forming  ability  groups  within  a grade.  (32%) 

7.  To  aid  in  determining  the  promotion  of  pupils  from  one  grade  to  anoth- 
er. (34%) 

8.  To  compare  the  standing  of  classes  within  the  school.  (34%) 

9.  To  aid  in  determining  which  pupils  are  capable  of  doing  exception;.’! 
work.  (30%) 

10.  To  compare  the  standing  of  schools  within  the  system.  (27%) 

11.  To  bring  about  a better  understanding  of  the  capabilities  of  the  pupil 
when  discussing  educational  plans  of  the  pupils  with  the  parents.  (26%) 


22.  Otis,  A.  S.  Percentile  Graph.  Published  by  the  World  Book  Company, 
Yonkers-on-Hudson,  New  York. 


IOC 


TESTING  PROGRAMS 


12.  To  aid  in  determining  which  pupil's  will  fail  in  a subject.  (23%) 

13.  To  satisfy  parents  that  their  children  have  been  marked  fairly.  (23%) 

14.  To  furnish  information  concerning  the  probable  success  a pupil  will  have 
in  a certain  curriculum.  (22%) 

15.  To  furnish  an  estimate  of  the  pupil’s  probable  success  in  college.  (20%) 

The  main  uses  of  standardized  achievement  tests  accord- 
ing to  the  teachers  are: 

1.  To  compare  the  results  attained  by  my  class  with  the  norms.  (20%) 

2.  To  compare  the  results  attained  in  two  or  more  of  my  classes.  (14%) 

3.  To  aid  in  determining  the  pupil’s  mark.  (13%) 

4.  To  show  pupils  in  what  part  of  the  subject  they  are  weak.  (12%) 

5.  To  stimulate  pupils  to  do  better  work.  (12%) 

6.  To  aid  in  determining  which  pupils  will  fail.  (10%) 

7.  To  discover  what  parts  of  the  topic  need  to  be  retaught.  (10%) 

8.  To  discover  what  parts  of  a topic  or  unit  need  to  be  taught.  (10%) 

9.  To  aid  in  discovering  which  pupils  are  capable  of  doing  exceptional 
work.  (9%) 

10.  To  enable  me  to  tell'  whether  poor  work  is  due  to  lack  of  ability  or  to 
other  factors  which  can  be  corrected.  (9%) 

11.  To  aid  in  studying  and  advising  failing  pupils.  (8%) 

12.  To  satisfy  parents  that  their  children  have  been  marked  fairly.  (8%) 

13.  To  form  ability  groups  within  the  room.  (5%) 

14.  To  bring  about  a better  understanding  of  the  capabilities  of  the  student 
when  discussing  educational  or  vocational  plans  of  the  pupils  with  the 
parents. (5%) 

15.  To  furnish  an  estimate  of  the  pupil’s  probable  success  in  college.  (4%) 

It  is  interesting  to  note  that  the  first  and  fifteenth  uses 
agree  in  both  lists.  The  main  use  for  standardized  achievement 
tests  is  to  obtain  comparisons.  From  the  teachers’  uses  the 
other  values  are  in  aiding  in  marking,  diagnosing  pupil  weak- 
ness, stimulation  of  pupils,  and  weak  spots  in  the  teaching. 


SELECTED  REFERENCES 

Buckingham,  B.  R.  Research  for  Teachers.  New  York:  Silver,  Burdett  and 
Company,  1926,  Chapter  IV-V. 

Brief  but  helpful1  discussion  of  achievement  tests  and  of  combining  intelli- 
gence and  achievement  test  results. 

Commission  on  English.  Examining  the  Examination  in  English.  Cambridge: 
Harvard  University  Press,  1931,  Chapters  IX-XI. 

Considers  the  aims  of  English,  the  educational  values  of  the  examination 
and  offers  recommendations.  Especially  valuable  to  teachers  of  English. 
Douglass,  Harl  R.  Modern  Methods  in  High  School  Teaching.  Boston:  Hough- 
ton Mifflin  Company,  1926,  Chapter  XIII. 

A brief  but  helpful  discussion  of  the  use  of  standardized  tests  in  high 
school  teaching. 


ACHIEVEMENT  TESTING 


101 


Henmon,  V.  A.  C.  Achievement  Tests  in  Modern  Foreign  Language.  New  York: 
The  Macmillan  Company,  1929,  363  pp. 

Report  of  the  Modem  Foreign  Language  Study.  A most  comprehensive 
account  of  achievement  tests  in  modem  foreign  language. 

Kelley,  Truman  Lee  Interpretation  of  Educational  Measurements.  Yonkers-on- 
Hudson,  New  York:  World  Book  'Company,  1927,  Chapters  III-IV. 

Presents  an  exceedingly  helpful  discussion  and  illustration  of  interpreting 
achievement  test  scores. 

Koos,  L.  V.  and  Kefauver,  G.  N.  Guidance  in  Secondary  Schools.  New  York: 
The  Macmillan  Company,  1932,  Chapter  XI. 

Discusses  the  value  of  achievement  tests  in  guidance. 

Mort,  Paul  R.  and  Gates,  Arthur  I.  The  Acceptable  Uses  of  Achievement  Tests. 
New  York:  Teachers  College  Bureau  of  Publications,  Columbia  University, 
1„32,  Chapters  I-IV. 

Describes  uses  of  achievement  tests  for  comparisons,  diagnosis,  classification 
and  remedial  teaching.  More  suitable  for  junior  than  senior  high  school. 

Odell,  C.  W.  Traditional  Examinations  and  New-Type  Tests.  New  York:  The 
Century  Company,  1923,  Chapter  I. 

Contrasts  the  relative  advantages  of  teacher-made  and  standardized  tests. 

Odell,  C.  W.  Educational  Measurement  in  High  School.  New  York):  The  Century 
Company,  1930,  641  pp. 

Chapters  IV  to  XIV  describe  standardized  achievement  tests  in  various 
subject  matter  fields.  Chapters  XXI  to  XXIII  discuss  the  use  of  such  tests. 

Ruch,  G.  M.  The  Objective  or  New  Type  Examination.  Chicago:  Scott,  Fores- 
man  and  Company,  Chapter  VI. 

Lists  relative  values  of  standardized  and  non-standardized  tests. 

Ruch,  G.  M.  and  Stoddard,  G.  D.  Tests  and  Measurements  in  High  School 
Instruction.  Yonkers-on-Hudson,  New  York:  World  Book  Company,  1927, 
Chapters  V through  XIII. 

Contains  descriptions  of  tests  in  various  subject  matter  fields  while  Chap- 
ters V-X  describe  tests  in  various  subject  matter  fields  while  Chapters 
XIX-XXIV  discuss  their  use. 

Symonds,  Percival'  M.  Ability  Standards,  New  York:  Teachers  College  Bureau 
of  Publications,  Columbia  University,  1927,  91  pp. 

Contains  comparable  standards  for  nine  achievement  tests  at  various  times 
during  the  year. 

Tiegs,  Ernest  W.  Tests  and  Measurements  for  Teachers.  Boston:  Houghton 
Mifflin  Company,  1931,  470  pp. 

Chapters  VI  to  XII  discusses  use  of  test  results  and  Chapter  XIX  deals 
with  tests  in  secondary  education. 

Van  Wagenen,  M.  J.  Educational  Diagnosis.  New  York:  The  Macmillan  Com- 
pany, 1926,  276  pp. 

Offers  suggestions  for  handling  test  scores. 


CHAPTER  VI 

ADMINISTRATION  OF  TEACHER-MADE  TESTS 

Introduction 

Teacher-made  tests  are  g^ven  about  eight  times  as  often 
as  are  standardized  achievement  tests  in  the  secondary  schools, 
according  to  the  results  from  the  sampling  of  sixteen  hund- 
red teachers.  If  a knowledge  of  standardized  achievement 
tests  is  important  to  the  administrator,  how  much  more  im- 
porant  is  knowledge  about  teacher-made  tests.  It  is  the  pur- 
pose of  this  chapter  to  discuss  the  problem  briefly  under  the 
following  topics: 

1.  What  constitutes  a good  test? 

2.  What  are  the  relative  merits  of  the  objective  and  essay  tests? 

3.  What  are  the  possibilities  of  cooperative  testing  in  the  school? 

4.  What  steps  should  he  followed  in  constructing  teacher-made  tests? 

5.  What  are  likely  to  be  the  faults  in  teacher-made  objective  tests? 

6.  What  provisions  should  be  made  for  duplicating  tests? 

7.  What  are  the  Uses  which  are  made  of  the  objective  and  essay  tests? 

It  is  not  the  purpose  of  this  chapter  to  treat  fully  all  aspects 
of  teacher-made  tests.  Such  a treatment  could  not  be  handled 
in  his  limited  space.  A bibliography  is  given  at  the  end  of  the 
chapter  which  provides  references  for  more  detailed  treat- 
ment of  the  subject. 

1.  What  Constitutes  a Good  Test? 

A good  test  is  one  that  is  valid,  reliable  and  easy  to  give  and 
score.  A test  is  valid  when  it  measures  what  it  is  intended  to 
measure.  A history  test  designed  to  measure  the  pupil’s  under- 
standing of  relationships  would  not  be  valid  if  it  only  tested 
facts.  A test  should  measure  the  principal  objectives  of  the 
course,  not  trivalties.  The  best  check  for  the  teacher  in  judging 
the  validity  of  the  test  is  to  be  sure  that  the  items  selected  1. 
measure  the  objectives  of  the  course  as  nearly  as  possible 
(see  section  4 of  this  chapter),  2.  are  the  more  important  ones 
of  the  course,  3.  parallel  the  actual  teaching  which  has  been 


TEACHER-MADE  TESTS 


103 


done,  and  4.  represent  a wide  sampling  of  the  materials  taught. 

A test  is  reliable  when  it  consistently  measures  whatever 
it  measures.  In  other  words,  reliability  represents  the  degree 
of  confidence  which  can  be  placed  in  the  results.  There  are 
some  technical  ways  of  figuring  the  reliability  of  a test  (1), 
but  the  following  suggestions  will  do  much  toward  insuring 
that  a test  is  reliable.  First,  the  test  should  be  objectively 
scored.  The  teacher  should  get  the  same  results  if  she  scores 
the  test  a second  time  or  if  some  other  teacher  scores  the  test. 
Such  objectiveness  is  obtained  only  through  the  use  of  the 
objective  or  new-type  test  questions.  The  essay  or  discussion 
test  cannot  usually  be  scored  a second  time  without  showing 
considerable  variation  from  the  first  scoring.  Second,  a test 
should  contain  a large  number  of  items.  One  hundred  true- 
false  items  usually  give  satisfactory  reliability,  or  a somewhat 
smaller  number  of  other  types  of  items.  Third,  these  items 
should  represent  an  extensive  sampling  of  the  material  of  the 
course.  If  only  one  phase  of  a topic  is  tested,  pupils  who  are 
familiar  with  the  other  material  of  the  course  but  weak  on 
the  special  one  given,  would  do  poorly  on  the  test. 

A test  should  be  easily  administered  and  scored.  Much 
teacher-time  is  spent  correcting  test  papers  which  could  be 
more  profitably  spent  in  other  instructional  activities.  The 
use  of  the  objective  type  question  and  some  care  in  setting 
up  the  test  so  the  responses  are  at  the  edge  of  the  paper, 
greatly  facilitates  scoring.  Further  the  use  of  the  device  re- 
ferred to  in  Chapter  V,  section  5,  for  pupil  scoring  will  save 
time. 

The  three  points,  validity,  reliability  and  ease  of  scoring, 
constitute  the  three  phases  of  testing  which  must  be  provided 
for  to  insure  a good  examination. 

2.  What  Are  the  Relative  Merits  of  the  Objective  and  Essay 
Tests? 

There  are  two  types  of  tests  widely  used  by  teachers  today. 
They  are  the  old  type  essay  or  discussional  examination  and 

1.  Ruch,  G.  M.  and  Stoddard,  G.  D.  Tests  and  Measurements  in  High  School 
Instruction.  Yonkers-on-Hudson,  New  York:  World  Book  Company,  1927, 
p.  355-374. 

Symonds,  Percival  M.  Measurement  in  Secondary  Education.  New  York: 
The  Macmillan  Company,  1S27,  p.  561-567. 


104 


TESTING  PROGRAMS 


the  new-type  or  objective  tests.  Since  McCaflrs  (2)  introduc- 
tion of  the  true-false  test  in.  1920,  much  has  been  written, 
both  of  a discussional  and  an  experimental  nature  concern- 
ing the  relative  advantages  of  each  type  of  test.  Briefly,  the 
advantages  claimed  for  each  are  as  follows: 

Advantages  of  new-type  or  objective  examinations: 

1.  Wide  sampling  o£  items  is  passible. 

2.  Scoring  is  objective. 

3.  Scoring  is  economical. 

4.  Results  ace  more  reliable  (due  largely  to  items  1 and  2). 

5.  Does  not  confuse  knowledge  of  subject  with,  ability  of  expression. 

6.  Requires  more  attention  on  the  part  of  the  teacher  to  construct. 

1.  Requires  discrimination  of  thinking. 

Advantages  of  the  essay  examination: 

1.  Tests  the  ability  to  organize. knowledge  and  apply  it. 

2.  Enables  the  tester  to  ask  any  question,  not  limiting  questions  to  the  type 
that  can  be  scored  objectively. 

3.  Provides  for  language  training. 

4.  Is  economical  to  give  as  the  questions  can  be  written  on  the  blackboard. 

This  brief  summary  presents  advantages  claimed  exclusive- 
ly for  each  type  of  test  and  also  can  be  considered  as  a list 
of  the  difficulties  or  disadvantages  of  the  other  type.  The  out- 
standing advantages  of  the  new-type  test,  as  the  writer  sees 
them,  are  in  the  extensive  sampling  of  the  material  covered 
which  can  be  obtained,  and  in  the  increased  objectivity  of  scor- 
ing. The  advantages  claimed  for  the  essay  tests  making  poss- 
ible the  organization  of  knowledge  and  language  training 
would  seem  to  some  writers  to  be  better  arrived  at  through 
the  preparation  of  term  papers.  A very  good  case  can  be  made 
out  for  the  detrimental  effects  upon  written  expression,  when 
pupils  are  forced  to  write  rapidly  under  pressure.  The  ridi- 
culousness of  attempting  to  obtain  reflective  thinking  and  an 
organization  and  application  cf  the  knowledge  acquired,  in 
one  short  hour  can  be  made  apparent. 

Though  the  merits  of  the  objective  type  seem  to  outweigh 
those  of  the  essay  type,  there  is  undoubtedly  a place  for  both 
types  in  the  testing  program  of  the  high  school. 

The  practice  of  the  teachers  favors  objective  tests.  In  reply 
to  the  question,  “Of  what  do  most  of  your  self-made  tests 
consist,?”  nearly  three-fourths  (74%)  indicated  objective  tests. 
Only  16%  indicated  essay  and  10%  marked  both  essay  and 


2.  See  Chapter  II,  footnote  51. 


TEACHER-MADE  TESTS 


IDS 


objective.  These  facts  are  indicative  of  the  great  change  which 
has  taken  place  in  our  testing  since  1920,  when  the  objective 
test  was  introduced. 

3.  What  Are  the  Possibilities  of  Cooperative  Testing  in 
the  School? 

The  construction  of  uniform  objective  tests  by  departments 
seems  to  have  possibilities  usually  overlooked.  In  most  schools 
each  teacher  constructs  her  own  tests  without  consulting  with 
the  other  teachers.  However,  the  advantages  of  cooperation 
are  coming  to  be  recognized.  There  are  at  least  three  outstand- 
ing accounts  of  such  cooperative  work.  The  most  extensive 
one  is  Objective  Tests(3)  by  Orleans  and  Sealy,  dealing  with 
the  development  of  such  a cooperative  testing  program  on  the 
elementary  level.  The  others  by  Gibbons (4)  and  Miehell(5) 
deal  with  similar  experiments  in  the  field  of  the  social  studies, 

Michell,  in  discussing  the  values  of  such  testing,  indicates: 

“The  cooperation  among  teachers  in  planning  to  cover  requirements  pre- 
vious to  the  test,  the  discussion  of  aim  and  basis  for  selecting  material,  the 
acceptance  of  a distinctive  core  of  minimum  essentials  for  each  term  of  study, 
and  equalizing  the  standards  of  teachers  are  among  the  definite  gains  ...  If 
uniform  tests  are  to  be  used  effectively,  there  must  be  a realization  both  by 
the  supervisor  and  the  teachers  that  such  tests  can  and  should  measure  only 
the  minimum  essentials  for  a term  of  study,  and  that  a teacher’s  most  valu- 
able service  to  the  department  lies  in  the  individual  contribution  that  each 
teacher  gives  to  her  classes  in  addition  to  covering  the  content  agreed 
upon”  (6). 

The  steps  which  an  administrator  should  follow  in  com- 
mencing the  use  of  uniform  objective  tests,  as  outlined  by 
Orleans  and  Sealy(7)  are  as  follows: 

1.  Conduct  enough  experimentation  to  convince  the  teachers  that  a real 
need  for  more  uniformity  exists. 

2.  Assure  adequate  preparation  for  making  tests  and  handling  the  results. 
The  use  of  teachers’  meetings  is  suggested  for  this  purpose. 

3.  Construct  the  tests.  See  sectio_.  for  an  outline  of  the  steps. 

3.  Orleans,  Jacob  S.  and  Sealy,  Glenn  A.  Objective  Tests  Yonkers-on-HudsOn 
New  York:  World  Book  Company,  1S28,  373  pp.  Chapters  IV  to  XII  most  help- 
ful. 

4.  Gibbons,  Alice  N.  Tests  in  the  Social  Studies.  Published  by  the  National 
Council  for  the  Social  Studies,  1929,  144  pp. 

5.  Michell,  Elene.  Teaching  Values  in  New-Type  History  Tests.  Yonkers-on- 
Hudson,  New  York:  World  Book  Company,  1930.  179  pp.  Chapter  VII  discusses 
the  problem. 

6.  Ibid.,  n.  122 

7.  Orleans  and  Sealy,  op.  cit.,  p.  198. 


106 


TESTING  PROGRAMS 


4.  Give  the  tests,  simultaneously  in  each  class  in  the  same  subject  when- 
ever possible. 

5.  Score  the  papers.  This  is  done  by  having  each  pupil  score  his  own  (p.92), 
each  teacher  score  her  class,  or  by  having  the  teachers  meet  together  in 
a “scoring  bee.” 

6.  Tabulate  the  results  and  find  the  median  and  the  25th  and  75th  per- 
centile scores. 

7.  Assist  in  interpreting  the  scores.  This  last  step  is  one  most  apt  to  be 
neglected,  yet  is  one  of  the  most  important. 

A number  of  individual  schools  have  used  uniform  depart- 
ment tests  with  success.  A common  practice  is  to  develop  such 
tests  on  a city-wide  basis.  Among  the  cities  doing  a great  deal 
in  this  direction  are  Atlanta,  Chicago,  Denver,  Detroit,  Los 
Angeles,  and  Philadelphia. 

4.  What  Steps  Should  Be  Followed  in  Constructing  Teacher- 
Made  Tests? 

The  following  brief  outline  of  the  steps  to  be  taken  in  con- 
structing objective  tests  further  suggests  the  need  of  coopera- 
tive work.  Much  more  time  is  spent  in  working  out  new 
mediocre  tests  each  year  than  would  be  spent  if  such  examina- 
tions were  worked  out  carefully  and  then  used  from  year  to 
year  with  such  revisions  as  were  indicated  by  the  use  of  the 
tests.  The  following  outline  is  largely  a combination  of  those 
presented  by  Ruch(8)  and  Tyler (9).  Tyler’s  method  offers 
much  hope  for  the  improvement  of  measurement  in  provid- 
ing a technique  for  measuring  other  skills  than  fact  and 
information. 

1.  Formulate  the  course  objectives.  If  these  have  already 
been  formulated  this  first  step  can  be  omitted.  Tyler  suggests 
eight  main  types  of  objectives  which  have  been  found  on 
the  college  level.  These  are  (a)  information;  (b)  reasoning, 
including  induction,  testing  hypotheses,  and  deduction;  (c) 
location  of  relevant  data;  (d)  skills  characteristic  of  the  parti- 
cular subject;  (e)  knowledge  of,  ability  to  evaluate,  and  skill 
in  applying  standards  of  technical  performance  (f)  reports 
referring  to  skill  in  reporting  projects  or  experimentations; 
(g)  consistency  in  application  of  point  of  view,  and  (h)  char- 
acter. 

8.  Ruch,  G.  M.  The  Objective  or  New-Type  Examination.  Chicago:  Scott 
Foresman  and  Company,  1929,  pp.  149-181. 

9.  Tyler,  Ralph  W.  “A  Generalized  Technique  for  Constructing  Achieve- 
ment Tests.”  Educational  Research  Bulletin,  Ohio  State  University,  10:199- 
208,  April  15,  1931. 


TEACHER-MADE  TESTE 


M 


This  hrst  step  shows  what  it  is  necessary  to  measure.  All 
cof  these  objectives  may  not  be  measured  by  an  objective  test, 
The  challenge  to  the  teacher  is  to  devise  some  means  of  measure 
:ing  the  various  objectives.  The  omission  of  this  first  step  has 
iresulted  in  most  tests  measuring  only  -informations 

2.  Define  each  objective  in  terms  of -student  behavior.  The 
method  of  making  these  definitions  is  illustrated  by  Tyler  for 
elementary  .zoology  in  the  following,- 

“In  defining  -the  -first  nty ectiv-e,  a -fund  of  information  about  animal  activi- 
ties and  structures,  the  specific  -facts  and  general  principles  which  the  students 
should  be  -able  to  xecall  without  reference  to  text  books,  or  other  -sources  of 
information,  were  indicated.  The  second  objective,  an  understanding  of  techni- 
cal terminology,  was  defined  by  .listing  the  terms  which  the  student  himself 
should  be  able  to  use  in  his  own  reports,  aftd  another  list  of  terms  which 
he  would  net  be  expected  to  use,  but  should  be  able  to  understand  when 
he  finds  them  in  zoological  publications.  The  third  objective,  ail  ability  te 
draw  inferences  from  facts,  that  is,  to  propose  hypotheses,  was  defined  by 
'describing  the  typos  of  experiments  which  an  elementary  student  should  be 
able  to  interpret.  The  fourth  objective,  ability  to  propose  ways  of  testing 
hypotheses,  was  defined  by  listing  the  types  of  hypotheses  which  an  element- 
ary student  should  be  able  to  validate  by  experiment,  or  to  propose  ways  of 
validation.  The  fifth  objective.,  an  ability  to  apply  principles  to  concrete  situ- 
ations, was  defined  by  listing  the  principles  which  elementary  students  should 
be  able  to  apply,  and  types  of  concrete  situations  in  which  the  student  might 
apply  these  principles.  The  sixth  objective,  accuracy  of  observation,  was  de- 
fined by  listing  the  types  of  experiments  in  which  elementary  students  should 
be  able  to  make  accurate  observations.  The  seventh  objective,  skill  i'n  the  Use 
of  the  microscope  and  other  essential  tools,  was  defined  by  describing  the 
types  of  microscopic  mounts  and  types  of  dissections  which  elementary  stu- 
dents should  leam  to  make.  The  eighth  objective,  an  ability  to  exnress  effect- 
ively ideas  relating  to  zoology,  was  defined  by  indicating  the  nature  of  the 
reports,  both  written  and  oral,  which  zoology  Students  are  expected,  to  'make 
and  the  qualities  demanded  for  these  reports  to  be  effective”  (10). 

These  situations  furnish  the  material  on  the  basis  of  which 
the  test  or  tests  are  to  be  constructed. 

That  this  step  is  a most  valuable  training  device  for  the 
teachers  is  at  once  apparent.  Most  teachers  seldom  Consider 
the  types  of  learning  situations  which  they  expect  will  result 
in  the  attainment  of  the  objectives.  This  type  of  approach  in 
forcing  the  teachers  to  consider  their  courses  in  relation  to 
the  objectives  is  concrete  and  requires  more  than  mere  “lip 
service”  on  the  part  of  the  teachers. 

3.  Drafting  items  or  situations  which  will  reveal  whether 


10.  Ibid.,  pp.  202-203. 


108 


TESTING  PROGRAMS 


students  have  obtained  the  objectives.  The  data  gathered  to- 
gether under  the  second  step  provide  the  working  material  for 
this  step.  The  items  should  be  put  in  the  form  most  convenient 
for  measuring.  This  will  prove  a challenge  to  the  combined 
ingenuity  of  the  members  of  each  department,  but  with  all 
working  on  the  problem,  they  should  be  able  to  produce  a 
set  of  preliminary  items  which  are  much  better  than  any  one 
teacher  could  produce. 

4.  Deciding  upon  the  length.  This  is  a practical  problem 
which  must  be  settled  by  the  members  of  the  department 
on  the  basis  of  the  amount  of  testing  material,  and  time.  If 
sufficient  material  is  gathered  together  it  is  often  advisable  to 
break  the  test  into  two  forms.  This  makes  a similar  test  avail- 
able for  re-testing,  testing  of  absentees,  and  use  in  alternate 
years. 

5.  Revising  the  preliminary  items.  The  items  should  be 
carefully  gone  over  and  all  ambiguities  removed.  Such  faults 
as  are  suggested  in  the  next  section  as  being  most  likely  to 
occur  should  be  carefully  watched  for  and  eliminated.  If  poss- 
ible, it  is  a decided  advantage  to  have  some  other  person  who 
is  familiar  with  the  subject  go  over  the  test  at  this  point.  Often, 
many  improvements  can  be  made  from  the  suggestions  of 
such  a person. 

6.  Preparing  of  directions.  Most  writers  on  the  objective 
test  suggest  assemblying  all  items  of  one  type  together  and 
placing  at  the  beginning  of  each  such  set,  brief  concise  direc- 
tions. The  detail  of  the  directions  depends  on  how  “test-wise” 
the  class  is. 

7.  Making  the  scoring  key.  The  responses  on  the  scoring 
key  should  be  parallel  with  the  answer  blanks  on  the  test. 
One  of  the  easiest  ways  to  make  a scoring  key  is  to  use  a blank 
copy  of  the  test.  This  is  especially  helpful  if  the  responses 
are  all  at  the  edge  of  the  paper. 

8.  Giving  the  test.  This  step  is  simple  enough  for  pencil 
and  paper  tests;  however,  if  other  types  of  test  situations  have 
been  developed,  it  may  be  the  source  of  some  difficulty  and 
should  be  carefully  planned  beforehand. 

9.  Scoring  the  tests.  Where  the  test  is  objective  it  can  be 
scored  with  a key.  Where  it  is  not  objective  the  teachers  in 
the  department  should  set  standards  for  evaluating  responses 
and  should  score  the  reactions  in  terms  of  these  standards. 


TEACHER- MADE  TESTS 


109 


10.  Development  of  more  practical  methods  of  measure- 
ment. If  some  of  the  items  on  the  test  require  subjective 
evaluation,  it  may  be  possible  to  make  an  objective  question 
which  measures  the  same  ability.  Tyler  describes  how  an 
essay  question,  asking  students  to  propose  inferences  which 
may  be  deduced  from  the  results  of  experiment,  was  changed 
to  an  objective  question.  The  first  change  was  to  a multiple 
choice  type  of  question  where  the  student  was  told  to  check 
the  best  out  of  five  given  inferences.  This  type  of  objective 
testing  correlated  only  .38  with  the  essay  test.  Longer  lists 
were  constructed  in  which  each  inference  had  been  given  score 
values  by  three  instructors.  The  pupils  checked  here  for  the 
best  and  poorest  inference.  This  form  correlated  .85  with  the 
essay  test  and  so  was  substituded  for  it. 

11.  Further  validate  the  items  by  statistical  means.  If  tests 
are  to  be  used  from  year  to  year  Symonds(ll)  suggests  a 
simple  method  of  measuring  the  “differentiating  power”  of 
each  item.  All  test  papers  are  saved  until  the  end  of  the  year 
and  then  the  papers  of  the  highest  and  lowest  twenty-five  per 
cent  of  the  class,  as  judged  by  the  final  rating,  are  singled 
out.  The  number  of  correct  responses  made  by  the  top  group 
are  contrasted  with  the  number  correct  made  by  the  lowest 
group.  The  extent  to  which  more  of  the  top  group  had  an 
item  correct  than  did  the  lower  group  is  a measure  of  the 
differentiating  power  of  the  item. 

This  procedure  is  rather  laborious  but  valuable  if  the  same 
items  are  to  be  repeated.  The  teachers  would  not  necessarily 
have  to  make  this  analysis  at  the  end  of  the  year  but  could 
do  it  during  the  next  year. 

This  method  of  analysis  acts  as  a check  on  the  validity  of 
the  items  which  were  selected  to  meet  certain  objectives.  If 
any  of  the  items  should  be  answered  correctly  by  more  of 
the  poorer  students  than  the  better  students,  it  is  probably 
an  indication  that  the  objective  is  not  being  measured.  In  the 
case  of  a disagreement  between  the  statistical  validation  and 
the  validation  by  judgment,  the  item  should  be  studied  most 
carefully  and  either  revised  or  discarded. 

As  one  studies  the  outline  of  eleven  steps,  the  value  of  con- 
structing cooperative  examinations,  which  can  be  used  from 

11.  Symohds,  Percival  M.  Measurement  in  Secondary  Education.  New  York: 
The  Macmillan  Company,  1927,  pp.  535-536. 


TESTING  FHOCKA1V.S- 


year  to  year,  becomes  increasingly  apparent.  This  is  cfire  to 
the  labor  involved  in  constructing;  a good  test.  The.  state- 
ment does  not  infer  that  once  an  examination  is  constructed’ 
it  should  remain  static.  Rather  it  should  be  considered  as. 
suggestive  of  the  need  of  improvement  from  year  to  year  and 
the  need  for  change  when  new  materia L is  added  to  the  course.. 

What  Are  Likely  to  Be  the  Faults  of.  Teacher-Made; 

Objective  Tests? 

The  administrator  can  use  a knowledge  of  the  most  likely 
faults  of  teachers  in  constructing  objective  tests  as  a guide  in 
studying  the  teacher-made  objective  tests.  The  most  likely 
faults  can  be  stressed  before  the  tests  are  constructed  with, 
the  hope  of  eliminating  thetm. 

This  list  of  faults  is  obtained  principally  from  an  article 
by  Baldwin  Lee(12)  who  analyzed  some  SO  examinations  pre- 
pared by  high  school  teachers.  He  found  that  the  most  com- 
mon faults  were; 

1.  Undue  emphasis  placed  on  factuaL  content.  The  facts 
themselves  are  apt  to  be  selected  for  testing  instead  of  the 
functional  applications  of  these  facts. 

2.  Poor  wording.  This  difficulty  was  most  apparent  in 
true-false  tests. 

3.  Ambiguity,  Questions  may  be  interpreted  in  various 
ways.  One  device  which  tends  to  eliminate  this  fault  is  to 
make  out  the  key  on  a separate  paper  at  the  same  time  the 
questions  are  drafted.  After  several  days  have  passed  make 
out  a new  key  to  the  examination.  Disagreements  are  apt  to 
indicate  that  the  statement  may  have  two  meanings. 

4.  Repetition  of  items.  The  same  item  is  apt  to  be  drafted 
into  two  types  of  questions,  for  instance  a true-false  and  a 
multiple  choice  question.  This  can  be  avoided  by  working 
through  the  material  drafting  the  items  into  whatever  forms 
are  most  appropriate,  then  bringing  together  all  of  each  type 
later.  If  the  material  is  first  studied  with  the  idea  of  picking 
out  all  material  for  true-false  statements,  then  reviewed  again 
for  all  multiple  choice  questions,  such  duplication  will  prob- 
ably occur. 

12.  Lee,  Baldwin.  “Some  Faults  'Common  in  Informal  Objective  Tests  made 
by  High  School  Teachers.’’  Educational  Administration  and  Supervision, 
14;  105-113,  February,  1928. 


TEACHER-MADE  TESTS 


111 


5.  Involved  phraseology.  Difficult  words  or  phrases  in  the 
test  mean  that  the  test  is  apt  to  be  measuring  vocabulary 
or  reading  ability  instead  of  the  subject  which  is  being  tested. 

6.  One  statement  answering  another.  A careful  study  of 
the  items  after  the  test  is  complete  will  probably  eliminate 
this  difficulty. 

7.  Obvious  answer.  Such  a statement  is  so  much  “dead 
wood”  in  the  tests. 

8.  Unimportant  and  trivial  items.  If  the  test  is  constructed 
according  to  the  specifications  outlined  in  the  previous  sec- 
tion, these  items  will  largely  be  eliminated. 

9.  Misspelling. 

10.  Obviously  wrong  responses  in  multiple-choice  test. 
When  some  of  the  several  items  listed  as  choices  are  obvious- 
ly wrong,  they  increase  greatly  the  chance  of  guessing  the 
correct  response. 

11.  Dependent  items.  Questions  should  not  be  dependent 
upon  previous  questions.  Each  one  should  constitute  a sepa- 
rate unity. 

12.  Unnecessary  modifiers.  All  unnecessary  modifiers 
should  be  eliminated  as  they  are  a source  of  doubt. 

13.  Specific  determiners.  The  most  comprehensive  list  of 
specific  determiners  appearing  in  high  school  teachers’  true- 
false  tests  is  given  by  Brinkmeir  and  Ruch(13).  They  state 
that  in  the  375  tests  which  they  examined  they  found 
that  (14) : 

a.  About  nine  out  of  ten  statements  containing  only  or  alone  are  false. 

b.  Four  out  of  five  statements  containing  all  are  false. 

c.  Four  out  of  five  statements  containing  no,  none  or  nothing  are  false. 

d.  Three  out  of  four  statements  containing  always  or  never  are  false. 

e.  Two  out  of  three  statements  containing  clauses  of  cause  or  reason  are  false. 

f.  Statements  containing  should  are  probably  more  often  true  than  false. 

g.  Three  out  of  four  statements  containing  may  or  expressing  possibilities 
are  true. 

h.  Three  out  of  four  statements  containing  such  words  as  most,  some,  often, 
generally,  etc.  are  true. 

i.  Four  out  of  five  statements  containing  enumerations  are  true. 

13.  Brinkmeir,  I.  H.  and  Ruch,  G.  M.  “Minor  Studies  in  Objective  Examina- 
tion Method-Ill  Specific  Determiners  in  True  and  False  Statements.”  Journal 
of  Educational  Research,  22:110-118,  September,  1930. 

14.  This  list  represents  a slight  rearrangement  of  the  one  given  in  the  cited 
reference. 


112 


TESTING  PROGRAMS 


From  other  sources  the  following  are  found: 

j • Three  out  of  four  statements  containing  more  than  twenty  words  are 
true. 

k.  One  out  of  five  statements,  due  to  circumstantiality  of  content,  or  mere 
obviousness,  can  usually  be  recognized  as  true  whether  one  has  a knowl- 
edge of  the  subject  or  not. 

The  method  of  avoiding  the  pitfalls  of  these  determiners 
is  to  keep  the  list  in  mind  and  plan  to  use  them  as  often  with 
true  statements  as  false. 

14.  Incorrect  provision  for  the  response.  The  questions 
should  be  arranged  for  ease  in  scoring,  such  as  placing  true- 
false  and  multiple  choice  responses  to  the  left,  etc. 

15.  Confusing  directions.  Directions  should  always  be  clear 
and  precise. 

16.  Faulty  scoring  key.  The  key  should  be  made  so  that 
when  placed  on  the  pupil’s  paper,  the  answer  will  appear 
beside  the  pupil’s  response. 

6.  What  Provisions  Should  Be  Made  for  Duplicating  Tests? 

Objective  tests  in  most  cases  need  to  be  duplicated  in  some 
form  so  that  a copy  is  available  for  each  pupil.  Most  second- 
ary schools  of  any  size  are  equipped  with  some  type  of  dupli- 
cating machine.  Where  such  equipment  is  available  it  seems 
only  economy  to  use  it  in  preparing  tests. 

Some  administrators  feel  that  there  is  such  value  in  their 
teachers  giving  objective  tests  that  they  make  provisions  for 
clerical  help  in  the  cutting  and  running  of  stencils.  Where 
such  provision  is  not  made,  it  is  very  difficult  for  the  teacher 
to  prepare  adequate  tests.  The  principal  should  evaluate  cleri- 
cal activities  in  terms  of  the  “improvement  affected  in  the 
learning  situation”  and  certainly  good  measuring  devices  re- 
sult in  improved  instruction. 

Another  advantage  of  having  tests  mimeographed  at  the 
central  office  is  that  the  principal  is  able  to  obtain  sample 
copies  with  no  trouble  to  the  teachers.  If  the  principal  sug- 
gested that  the  teacher  furnish  the  office  with  a copy  of  tests 
which  were  not  duplicated,  it  would  require  considerable 
extra  labor.  An  analysis  of  all  the  tests  given  over  a semester 
would  furnish  a picture  of  the  strengths  and  weaknesses  of 
the  faculty  in  test  construction.  Such  knowledge  would  afford 
the  starting  point  in  a program  for  improved  testing. 


TEACHER-MADE  TESTS 


113 


7.  What  Are  the  Uses  Which  Are  Made  of  Objective  and 
Essay  Tests? 

Objective  tests.  The  first  list  is  the  uses  which  admini- 
strators said  were  made  of  objestive  tests  in  their  schools. 
Remember  the  per  cents  are  the  p^r  cent  of  the  total  number 
of  schools. 

1.  To  aid  in  determining  which  pupils  will  fail  iii  & subject.  (40%) 

2.  To  aid  in  determining  the  promotion  of  pupil's  from’  one  grade  to  another, 
(39%) 

3.  To  measure  progress  during  the  semester  or  year.  (35%) 

4.  To  aid  in  studying  and  advising  failing  pupils.  (24%) 

5.  To  stimulate  interest  on  the  part  of  the  teachers  in  the  improvement  of 
instruction.  (23%) 

6.  To  help  in  determining  whether  a pupil  is  working  up  to  his  or  her  capa- 
city. (20%) 

7.  To  aid  in  determining  graduation  from  senior  high  school.  (20%) 

8.  To  help  in  forming  ability  groups  within  a grade.  (18%) 

9.  To  compare  the  standing  of  classes  within  the  school.  (19%) 

10.  To  aid  in  determining  whether  pupils  should  be  recommended  to  college 
or  university.  (17%) 

11.  To  aid  in  determining  graduation  from  junior  high  school.  (16%) 

12.  To  bring  about  a better  understanding  of  the  capabilities  of  the  pupil  when 
discussing  educational  or  vocational  plans  of  the  pupils  with  the  parents, 
(15%) 

13.  To  satisfy  parents  that  their  children  have  been  marked  fairly.  (14%) 

14.  To  help  in  forming  ability  groups  within  the  room.  (14%) 

15.  To  aid  in  determining  which  pupils  are  capable  of  doing  exceptional  work. 
(14%) 

16.  To  evaluate  a course  of  study.  (14%) 

Following  is  the  list  of  uses  which  the  teachers  said  they 
made  of  tests: 

1.  To  aid  in  determining  the  pupil's  mark.  (55%) 

2.  To  discover  what  parts  of  the  topic  need  to  be  retaught.  (53%) 

3.  To  show  pupils  in  what  part  of  the  subject  they  are  weak.  (47%) 

4.  To  stimulate  pupils  to  do  better  work.  (44%) 

5.  To  discover  what  parts  of  a topic  or  unit  need  to  be  taught.  (36%) 

6.  To  aid  in  determing  which  pupils  will  fail.  (34%) 

7.  To  compare  the  results  attained  in  two  or  more  of  my  classes.  (31%) 

8.  To  enable  the  teacher  to  tell  whether  poor  work  is  due  to  lack  of  ability 
or  to  other  factors  which  can  be  corrected.  (28%) 

9.  To  aid  in  disovering  which  pupils  are  capable  of  doing  exceptional  work. 
(17%) 

10.  To  aid  in  studying  and  advising  failing  pupils.  (15%) 

The  principal  uses  of  objective  tests  consist  in  marking  and 
diagnosis  with  relatively  little  emphasis  on  guidance,  even 
of  the  failing  pupils. 


114 


TESTING  PROGRAMS 


Essay  tests.  In  studying  the  uses  made  of  essay  tests,  there 
are  two  comparisons  which  should  be  noticed.  First,  essay 
and  objective  tests  are  used  largely  for  the  same  purposes. 
Second,  the  per  cents  of  either  schools  or  teachers  making 
use  of  essay  tests  are  much  smaller  than  are  similar  ones  for 
the  objective  tests. 

The  first  list  is  the  administrators’  list: 

1.  To  aid  in  determining  which  pupils  will  fail'  in  a subject.  (29%) 

2.  To  aid  in  determining  the  promotion  of  pupils  from  one  grade  to  another. 
(28%) 

3.  To  measure  progress  during  the  semester  or  year.  (22%) 

4.  To  aid  in  determining  graduation  from  senior  high  school.  (17%) 

5.  To  aid  in  determining  whether  pupils  should  be  recommended  to  college 
or  university.  (14%) 

6.  To  aid  in  studying  and  advising  failing  pupils.  (13%) 

7.  To  aid  in  determining  graduation  from  junior  high  school.  (10%) 

8.  To  bring  about  a better  understanding  of  the  capabilities  of  the  pupil 
when  discussing  educational  or  vocational  plans  of  the  pupils  with  the 
parents.  (10%) 

9.  To  help  in  determining  whether  a pupil  is  working  up  to  his  or  her 
capacity.  (10%) 

Following  is  the  list  of  teachers’  uses: 

1.  To  aid  in  determining  the  pupil's  mark.  (30%) 

2.  To  discover  what  parts  of  a topic  need  to  be  retaught.  (21%) 

3.  To  show  pupils  in  what  part  of  the  subject  they  are  weak.  (20%) 

4.  To  stimulate  pupils  to  do  better  work.  (20%) 

5.  To  aid  in  determining  which  pupils  will'  fail.  (18%) 

6.  To  enable  the  teacher  to  tell  whether  poor  work  is  due  to  lack  of  ability 
or  to  other  factors  which  can  be  corrected.  (14%) 

7.  To  disover  what  parts  of  a topic  or  unit  need  to  be  taught.  (12%) 

8.  To  aid  in  discovering  which  pupils  are  capable  of  doing  exceptional  work. 

(12%) 

9.  To  compare  the  results  attained  in  two  or  more  of  my  own  classes.  (9%) 


SELECTED  REFERENCES 

Alberty,  H.  B.  and  Thayer,  V.  T.  Supervision  in  the  Secondary  SchooL  Boston: 
D.  C.  Heath  and  Company,  1931,  Chapter  XV. 

Sounds  a caution  as  regards  to  the  type  of  tests  used  to  measure  achieve- 
ment. 

Buckingham,  B.  R.  Research  for  Teachers.  New  York:Silver,  Burdett  and 
Company,  1926,  Chapter  VI. 

Brief  but  helpful  discussion  of  new-type  tests. 

Gibbons,  Alice  M.  Tests  in  the  Social  Studies.  Published  by  the  National 
Council  for  the  Social  Studies,  1929,  144  pp. 

An  account  of  experimenting  with  new-type  tests  in  Rochester  New  York. 


TEACHER-MADE  TESTS 


115 


Hopkins,  L.  Thomas.  The  Construction  and  Use  of  Objective  Examinations. 
Boulder,  Colorado:  University  of  Colorado,  1926,  119  pp. 

Consists  of  sample  true-false,  completion,  and  multiple  choice  tests. 

Lang,  Albert  R.  Modern  Methods  in  Written  Examinations.  Boston:  Houghton 
Mifflin  Company,  1930,  Chapters  III-X. 

Describes  the  various  types  of  teacher-made  tests. 

Lee,  J.  Murray  and  Symonds,  Percival  M.  “New-Type  or  Objective  Tests:  A 
Summary  of  Recent  Investigations.”  Journal  of  Educational  Psychology 
24:21-38,  January,  1933. 

“New  Type  or  Objective  Tests:  A Summary  of  Recent 

Investigations  (October,  1931-October  1933,”)  Journal  of  Educational  Psycholo- 
gy 25:161-184,  March,  1934. 

These  two  articles  summarize  all  the  research  studies  on  objective  tests  from 
1929  to  1933.  Sudies  previous  to  1923  are  summarized  by  Ruch. 

Michell,  Elene  Teaching  Values  in  New-Type  History  Tests.  Yonkers-on- 
Hudson,  New  York:  World  Book  Company,  1930,  179  pp. 

Excellent  treatment  of  new- type  tests  especially  suitable  for  the  teacher 
of  social  studies.  Chapter  VII  deals  wih  cooperative  testing. 

Odell,  C.  W.  Traditional  Examinations  and  New-Type  Tests.  New  York:  The 
Century  Company,  1928,  469  pp. 

Provides  a very  complete  treatment  of  teacher-made  tests. 

Odell,  C.  W.  Educational  Measurement  in  High  School.  New  York:  The 
Century  Company,  1930,  Chapter  XX. 

A brief  summary  of  the  material  included  in  the  previous  reference. 

Orleans,  Jacob  S.  and  Sealy,  Glenn  A.  Objective  Tests.  Yonkers-on-Hudson, 
New  York:  World  Book  HTompany,  1928,  373  pp. 

Reports  a cooperative  experiment  in  the  construction  of  objective  tests. 
Chapter  XII  is  especially  helpful  to  the  administrator  who  wishes  to  carry 
out  cooperative  testing. 

Paterson,  Donald  G.  Preparation  and  Use  of  New-Type  Examinations.  Yonkers- 
on-Hudson,  New  York:  World  Book  Company,  1925,  87  pp. 

A good  treatment  of  the  new-type  examination. 

Ruch,  G.  M.  The  Objective  or  New-Type  Examination.  Chicago:  Scott,  Fores- 
man  and  Company,  1929,  478  pp. 

Discusses  the  new-type  examination  very  completely.  Includes  a summary 
of  experimental  evidence  available.  Especially  valuable. 

Ruch,  G.  M.  and  Rice,  G.  A.  Specimen  Objective  Examinations.  Chicago:  Scott, 
Foresman  and  Company,  1930,  324  pp. 

The  best  collection  of  specimen  objective  examinations  published  for  the 
secondary  schools.  Rich  in  suggestions  for  teachers  constructing  their  own 
tests. 

Ruch,  G.  M.  and  Stoddard,  G.  D.  Tests  and  Measurements  in  High  School 
Instruction.  Yonkers-on-Hudson,  New  York:  World  Book  Company,  1927, 
Chapters  XIV-XVI. 

Considers  briefly  various  kinds  of  new-type  objective  questions  and  cer- 
tain scoring  considerations.  For  a detailed  treatment  see  Ruch’s  Objective 
or  New-Type  Examination. 


116 


TESTING  PROGRAMS 


Russell,  Charles.  Classroom  Tests.  Boston:  Ginn  and  Company,  1926,  346  pp. 
Part  I outlines  methods  of  constructing  objective  tests  while  Part  II  deals 
with  how  to’  use  such  tests. 

Symonds,  Percival'  M.  Measurement  in  Secondary  Education.  New  York:  The 
Macmillan  Company,  1927,  Chapter  XXV. 

A brief  discussion'  of  the  use  of  the  new-type  test  in  teaching, 
llegs,  Ernest  W.'  Tests  and  Measurements  for  Teachers.  Boston:  Houghton 
Mifflin  Company',  1931,'  Chapters  IV,  XIII,  and  XIV. 

Excellent  treatment  of  various  phases  of  objective  testing. 

Tyler,  Ralph  W.  “A-  Generalized  Technique  for  Constructing  Achievement 
Tests.”  Educational  Research  Bulletin,  Ohio  State  University,  10:199-208, 
April  15,  l93i. 

Contains  most  constructive  suggestions  for  the  future  development  of 
new-type  tests.  Outlines  methods  of  objective  test  construction  followed 
at  Ohio'  University.' 

Wood,  Ben  D.  Measurement  in  Higher  Education.  Yonkers- on-Hudson,  New 
York:  World  Book  Company,  1923,  Chapters  VIII-XII. 

A discussion  of  new-type  tests  with  examples  takten  from  college  subjects. 


CHAPTER  VII 


THE  ADMINISTRATION  OF  TESTING 
A BRIEF  SUMMARY 

The  purpose  of  this  chapter  is  to  outline  briefly  the  .admini- 
stration of  testing  in  the  secondary  schools.  This  program  has 
been  developed  from  the  practice  of  junior,  senior,  and  six 
year  high  schools  as  shown  by  questionnaire  returns  from 
493  schools;  experimental  evidence  wherever  available;  and 
expert  opinion  as  found  in  the  literature  on  testing. 

This  outline  summarizes  the  findings  of  the  investigation 
and  can  be  used  as  a check  list  by  administrators  or  those 
having  charge  of  the  testing  program  after  becoming  familiar 
with  the  material  as  presented  in  more  detail  in  the  previous 
chapters.  Where  any  question  arises,  it  would  be  advisable 
to  read  the  original  presentation. 

Importance.  The  importance  of  measurement  in  secon- 
dary education  lies  in  its  possible  contributions  to  the  evalu- 
ation of  instruction,  to  the  improvement  of  guidance,  to  the 
recognition  of  individual  differences,  to  the  improvement  of 
supervision,  to  the  diagnosis  of  personality  and  conduct,  to 
the  increased  validity  and  reliability  of  marking,  to  the  evalu- 
ation of  research,  and  to  the  determination  of  the  extent  to 
which  objectives  have  been  attained. 

Development.  The  present  measurement  program  began 
near  the  beginning  of  the  century.  Standardized  testing  in 
the  high  school  dates  from  1916.  Since  that  time  tests  have 
been  published  in  increasing  quantities  until  there  are  now 
over  610  tests  commercially  available  for  use  in  the  secondary 
schools.  These  cover  practically  every  subject. 

A.  Planning  the  Program  (Chapter  III) 

1.  Types  of  tests  available 
a.  Standardized  tests 


118 


TESTING  PROGRAMS 


(1)  Intelligence 

(a)  Group  (given  in  84%  of  the  schools  report- 
ing) 

(b)  Individual  (given  in  52%  of  the  schools) 

(2)  Achievement  (Given  in  83%  of  the  schools) 

(3)  Aptitude  or  Prognosis  (Used  in  23 % of  the 
schools) 

(4)  Rating  Scales  (Used  by  12%  of  the  schools) 

(5)  Character  or  Personality  (Used  by  3%  of  the 
schools) 

(6)  Questionaire  Blanks 

(a)  Adjustment 

(b)  Attitude 

(c)  Interest 
b.  Teacher-made  tests 

(1)  Objective  (Used  largely  by  74%  of  the  teachers) 

(2)  Essay  (Used  largely  by  16%  of  the  teachers) 

2.  Responsibility  for  selection  of  standardized  tests 

a.  Intelligence  tests 

(1)  Whenever  possible  by  research  director  (fol- 
lowed in  41%  of  the  schools) 

(2)  Otherwise  by  someone  with  sufficient  knowl- 
edge to  make  a suitable  selection  (Principal  in 
22%  of  the  schools) 

b.  Achievement  tests 

(1)  Use  of  an  approved  list  in  large  cities  (done  by 
a third  of  the  research  departments  in  cities 
over  100,000  population) 

(a)  Department  head  selects  from  list 

(b)  Teacher  selects  from  list. 

(2)  In  small  localities  persons  most  familiar  with 
tests  should  select  or  at  least  advise  in  selection. 
Might  be 

(a)  Department  head  or  chairman 

(b)  Teacher 

(c)  Principal 

(d)  Vice-Principal 

(e)  Research  director 

(f)  Counselor 


A BRIEF  SUMMARY 


119 


(3)  Where  no  rsearch  department  exists,  the  princi- 
pal should  have  sufficient  knowledge  to  approve 
selection. 

c.  Other  tests 

(1)  Persons  most  familiar  with  tests 

(a)  Research  director 

(b)  Counselor 

(c)  Principal 

(d)  Vice-Principal 

B.  Administration  of  Intelligence  Tests  (Chapter  IV) 

1.  Per  cent  to  be  tested 

a.  Mental  test  results  should  be  available  on  each  pupil 
(only  47 % of  the  schools  have  results  on  all  pupils) 

2.  Frequency  of  giving  intelligence  tests 

a.  Experimental  evidence  indicates  the  need  of  testing  at 
intervals  of  several  years 

b.  Practice  and  opinion  also  indicate  the  advisability  of 
testing  at  crucial  points 

3.  Grades  in  which  intelligence  tests  should  be  given 

a.  Crucial  points  for  testing  seem  to  be  just  before  or 
after  entrance  to  the  seventh  grade  for  the  junior  and 
six  year  schools,  the  ninth  grade  for  the  four  year  high 
school,  the  tenth  year  for  the  senior  high  school.  In 
addition  tests  should  be  given  in  the  six  year  high 
school  at  the  place  where  differentiated  curricula  be- 
gin, probably  at  the  beginning  of  the  ninth  grade 

4.  Time  of  year  to  give  intelligence  tests 

a.  Depends  on  use  to  be  made  of  results 

(1)  End  of  year  if  results  are  to  be  used  in  registra- 
tion at  beginning  of  following  year  (Followed 
by  11%  of  the  schools) 

(2)  Beginning  of  year  (Followed  by  42%  of  the 
schools) 

5.  Types  of  pupils  requiring  individual  intelligence  tests 

a.  Pupils  deviating  in  some  measure  from  normal  (See 
o.  50  for  the  different  types  of  cases  which  are  tested  in 
the  schools  surveyed) 

b.  Wide  variation  in  types  of  cases  suggests  need  of  fur- 
ther study  of  problem  pupils  other  than  supplied  by 


120 


TESTING  PROGRAMS 


individual  test  (List  of  helpful  references  suggested 
p.  51) 

6.  Criteria  for  selecting  intelligence  tests 

a.  Validity 

b.  Reliability 

c.  Range  of  ability  which  the  test  will  measure 

(1)  Test  manual  states  suitable  grades 

(2)  Norms  of  the  test  provide  a clue 

(3)  Most  suitable  range  is  usually  in  the  area  near 
the  grade  equivalent  to  one-half  the  total  score 

d.  Easy  to  give  and  score  and  relatively  inexpensive 

7.  Comparableness  of  results  of  different  intelligent  tests 

a.  Results  of  different  tests  give  different  I.  Q.s 

b.  Tests  can  be  equated  by  means  of  tables  (see  refer- 
ences to  Kefauver  and  Runnel) 

c.  Two  tests  used  together  do  not  necessarily  decrease 
the  errors  of  the  I.  Q.  It  depends  on  the  tests  used. 

8.  Persons  who  should  give  intelligence  tests 

a.  Individual  tests  should  be  given  by  an  experienced 
examiner. 

b.  Group  tests  can  be  given  by  teachers  after  some  train- 
ing. It  would  appear  advisable  in  some  cases  to  have 
one  person  in  the  building  give  such  tests 

c.  Seems  advisable  that  the  principal  in  small  communi- 
ties be  sufficiently  trained  to  give  individual  intelli- 
gence tests 

9.  Scoring  the  tests 

a.  Scoring  intelligence  tests  is  a clerical  job.  Each  school 
must  work  out  the  most  efficient  system  possible  for 
getting  such  a clerical  job  done 

b.  Errors  in  scoring  occur  so  frequently  that  the  indi- 
vidual items  should  be  sampled  by  some  other  person 
to  see  if  mistakes  occur  and  then  all  other  processes 
should  be  checked  a second  time 

10.  Transmuted  score  to  be  used 

a.  I.  Q.  most  used  (in  97%  of  the  schools) 

b.  Percentiles  recommended  in  the  literature 


A BRIEF  SUMMARY 


121 


11.  Computing  I.  Q. 

a.  A maximum  chronological  age  is  used  in  computing 
i.  Q.s  of  older  pupils.  Some  writers  recommend  16, 
some  14,  and  some  favor  a compromise  of  15 

b.  Actual  computation  time  can  be  decreased  through 
use  of  tables  (see  pp.  6S-71) 

12.  Recording  of  test  scores 

a.  Test  records  should  be  recorded  in  a manner  which 
will  provide  for  cumulativeness  and  unity  in  the  rec- 
ord (provided  for  in  only  62^°  of  the  schools) 

b.  Items  which  should  be  recorded  are: 

(1)  Month  and  year  test  was  given 

(2)  Name  of  the  test 

(3)  Form  of  the  test 

(4)  Raw  score 

(5)  I.  Q. 

13.  Persons  to  whom  intelligence  test  results  should  be  avail- 
able 

a.  Teachers  should  be  given  test  results  in  some  form 
(done  in  97%  of  the  schools  which  give  intelligence 
tests) 

b.  Concensus  of  opinion  indicates  that  all  parents  and 
pupils  should  not  be  informed  of  results  (only  3^  of 
the  schools  inform  all  parents  and  only  1%  all  pupils) 

14.  Uses  of  test  results 

a.  For  a detailed  list  see  pages  77  and  78 

C.  Administration  of  Achievement  Tests  (Chapter  V) 

1.  Standardized  achievement  vs.  teacher-made  tests 

a.  The  peculiar  advantages  of  each  type  make  it  advisable 
to  provide  for  both  in  the  school 

b.  Evidence  indicates  that  standardized  tests  are  usually 
valid  and  usually  have  a higher  degree  of  reliability 
than  teacher-made  tests 

c.  Teachers  usually  prefer  their  own  self-made  tests  to 
standardized  tests  (Preferred  by  60 % of  those  reply- 
ing to  question) 

d.  Differences  in  use  exist  between  the  two  types  which 
make  it  advisable  to  use  both 


322 


TESTING  PROGRAMS 


e.  Though  teachers  usually  prefer  their  own  tests,  they 
would  like  the  same  tests  used  throughout  the  school 
and  to  a lesser  degree  throughout  the  system  and  state 

2.  Relative  emphasis  placed  on  standardized  tests  by  the  vari- 
ous dpartments 

a.  English  ranks  first  and  mathematics  second  in  the  total 
amount  of  standardized  achievement  testing  done 

b.  The  commercial  department  gives  more  standardized 
tests  and  teacher-made  tests  per  teacher  than  any  other 
department 

c.  The  science  and  social  studies  teachers  have  the  high- 
est ratio  of  teacher-made  tests  to  standardized  tests 

3.  Time  of  year  to  give  standardized  tests 

a.  End  of  semester  to  measure  achievement 

b.  Beginning  of  semester  for  diagnosis  in  second,  third 
or  fourth  year  of  continuous  subjects  such  as  Latin 

4.  Persons  who  should  give  achievement  tests 

a.  Standardized  tests  should  usually  be  given  by  the 
teachers 

5.  Scoring  the  tests 

a.  It  is  generally  agreed  that  teachers  should  score  such 
tests  (and  they  do  in  nearly  four-fifths  of  the  schools) 

b.  Experimental  evidence  indicates  that  pupil  scoring 
their  own  papers  is  most  producive  of  learning 

c.  A method  has  been  suggested  by  which  pupils  can 
score  their  papers,  yet  eliminates  possibilities  of  cheat- 
ing 

6.  Persons  to  whom  achievement  test  results  should  be  avail- 
able 

a.  There  is  agreement  that  the  teacher  should  have  such 
results  (available  to  classroom  teachers  in  98%  of 
schools) 

b.  Evidence  from  studies  in  motivation  indicates  that 
pupils  should  have  results  (such  practice  is  at  present 
limited  to  17%  of  the  schools) 

c.  Making  results  available  to  parents  presents  diffi- 
culties of  interpretation  (Though  in  15%  of  the  schools 
such  results  are  available  to  all  parents)  Present  marks 
seem  sufficient 


A BRIEF  SUMMARY 


123 


7.  Recording  of  test  scores 

a.  Cumulativeness  and  unity  required  as  for  intelligence 
tests  (Such  provision  is  made  in  only  one- third  of 
the  schools) 

8.  Transmuted  scores  to  be  used 

a.  Grade  norms  wherever  available 

b.  Percentile  norms 

c.  Record  on  card  the  following  data 

(1)  Month  and  year  test  given 

(2)  Name  and  form  of  test 

(3)  Score  (raw  score) 

(4)  Some  transmuted  score 

9.  Uses  of  test  results 

a.  For  a detailed  list  see  pages  99  and  100 

D.  Administration  of  Teacher-Made  Tests  (Chapter  VI) 

1.  Qualities  of  a good  teacher-made  test 

a.  Valid 

b.  Reliable 

c.  Easy  to  give  and  score 

2.  Merits  of  objective  and  essay  test 

a.  Objective 

(1)  Wide  sampling  of  items 

(2)  Objective  scoring 

(3)  Economical  scoring 

(4)  Results  more  reliable 

(5)  No  confusion  of  knowledge  with  express- 
ion 

(6)  Requires  more  care  in  construction 

(7)  Requires  discrimination  of  thinking 

b.  Essay 

(1)  Requires  ability  to  organize  knowledge 

(2)  Permits  use  of  any  question 

(3)  Provides  language  training 

(4)  Economical  to  give 

3.  Plan  of  construction  of  cooperative  tests 

a.  Convince  teachers  of  need 

b.  Train  teachers  in  making  tests 

c.  Construct  the  tests 


124  TESTING  PROGRAMS 

d.  Give  the  tests 

e.  Score  the  tests 

f.  Tabulate  results 

g.  Assist  in  interpreting  the  scoress 

4.  Construction  of  tests 

a.  Formulate  the  course  objectives 

b.  Define  each  objective  in  terms  of  student  behavior 

c.  Draft  these  behaviors  into  test  items 

d.  Decide  upon  the  length 

e.  Revise  the  preliminary  items 

f.  Prepare  the  directions 

g.  Make  the  scoring  key 

h.  Give  the  test 

i.  Score  the  tests 

j.  Develop  more  practical  methods  of  measurement 

k.  Further  validate  the  items  by  statistical  means 

5.  Probable  faults  of  teacher-made  examinations 

a.  Undue  emphasis  on  factual  content 

b.  Poor  wording 

c.  Ambiguity 

d.  Repititions  of  items 

e.  Involved  phraseology 

f.  One  statement  answering  another 

g.  Obvious  answer 

h.  Unimportant  and  trivial  items 

i.  Misspelling 

j.  Obviously  wrong  responses  in  a multiple  choice  test 

k.  Dependent  items 

l.  Unnecssary  modifiers 

m.  Specific  determiners  in  true-false  tests 

n.  Incorrect  provision  for  the  response 

o.  Confusing  directions 

p.  Faulty  scoring  key 

6.  Provisions  for  duplicating  tests 

a.  Done  by  clerical  help  whenever  possible 

7.  Uses  of  test  results 

a.  For  a detailed  list  see  pages  113  and  114 


Vita 


J.  Murray  Lee  was  born  in  Spokane,  Washington,  October  25,  1904.  His 
undergraduate  work  was  taken  in  Occidental  College,  Los  Angeles,  receiving 
the  Bachelor  of  Arts  degree  in  1926.  His  graduate  work  included  course  at 
University  of  Southern  California,  Stanford  University,  and  two  years  at 
Teachers  College,  Columbia  University,  receiving  the  Master  of  Arts  Degree 
from  the  latter  institution  in  1928.  His  pioiessional  experience  includes  teaching 
in  the  secondary  schools,  directing  research  in  the  public  schools,  administrative 
experience  in  the  elementary  schools  and  instruction  in  research  and  measure- 
ments on  the  college  level. 

He  is  author  of  the  Lee  Test  of  Algebraic  Ability  and  co-author  of  the  Lee 
Test  of  Geometric  Aptitude.  Lee  Maintenance  Drills  and  Tests  in  Arithmetic, 
Lce-Clark  Reading  Readiness  Test,  Lee-Clark  Reading  Tests-Primler  and 
i^rst  Reader,  and  New-Type  or  Objective  Tests,  Summary  of  Recent  Investi- 
gations appearing  at  intervals  in  the  “Journal  of  Educational  Psychology.”  He 
has  also  contributed  articles  to  several  educational  periodicals. 


